Aknowledgments

The structure of these notes was inspired by the structure of the websites Forecasting: Principles and Practice, 2nd and 3rd eds by Robin J Hyndman and George Athanasopoulos (see https://otexts.com/fpp3/about-the-authors.html).

knitr::include_url("https://otexts.com/fpp2/")
knitr::include_url("https://otexts.com/fpp3/")

We are indebted to Robin J Hyndman, George Athanasopoulos, and their coworkers for the considerable amount of ideas, data, and computational procedures they made publicly available.

1 Introduction to Time Series

Let \(\mathbb{R}^{N}\) be the real \(N\)-dimensional Euclidean space, for some \(N\in\mathbb{N}\).

Definition 1.1 (N-variate real time series) We call an \(N\)-variate real time series any sequence \(\left(x_{t}\right)_{t\in T}\) of points in \(\mathbb{R}^{N}\), where \(T\subseteq\mathbb{R}\) is a finite set of time indices. We call the length of \(\mathbf{x}\), denoted by \(\left\vert T\right\vert\), the number of the time indices in \(T\).

In principle, there are no restrictions on the sets of time indices characterizing a time series. Time indices may be sequentially spaced unevenly. However, in these notes, we will only consider sets of time indices whose elements are sequentially evenly spaced to identify time intervals of the same length (e.g., time intervals of length one second, one minute, one hour, one day, one week, one month, one quarter, one year,…).

The temporal ordering of the time series elements, given by the time indices, plays a crucial role in characterizing time series. Unlike cross-sectional Statistics, where the temporal order of data collection is irrelevant, as it is assumed that all data is collected simultaneously or in such a way that time plays no role, in the analysis of time series, the temporal order of data collection is always of the utmost importance, even when it is possible to prove its lack of influence.

Let \(\left(x_{t}\right)_{t\in T}\equiv\mathbf{x}\) be an \(N\)-variate real time series.

Definition 1.2 (Graph of a time series) We call the graph of the time series \(\mathbf{x}\) the subset of \(\mathbb{R}\times\mathbb{R}^{N}\) given by \[\begin{equation} \Gamma_{\mathbf{x}}\equiv\left\{(t,x)\in\mathbb{R}\times\mathbb{R}^{N}: t\in T,\,x=x_{t}\right\}, \tag{1.1} \end{equation}\]

Let us consider some examples.

Example 1.1 (Standard Bernoulli time series) We flip a fair coin \(T\) times, for some \(T\in\mathbb{N}\), and represent by \(1\) [resp. \(0\)] the occurrence of the outcome heads [resp. tails] on the \(t\)th flip, for \(t=1,\dots,T\). We obtain a time series \(\left(x_{t}\right)_{t=1}^{T}\) of points in \(\mathbb{R}\), such that \(x_{t}=1\) or \(x_{t}=0\), for every \(t=1,\dots,T\).

Definition 1.3 (Standard Bernoulli time series) We call a time series of the type presented in Example 1.1 a standard Bernoulli time series of length \(T\).

In what follows, we will see that a standard Bernoulli time series can be thought of as a sample path of the standard Bernoulli process. The latter turns out to be a process with independent and identically distributed components. In particular, it is a strong-sense stationary and wide-sense ergodic process (see Definitions 6.3 and 8.8).

We build three standard Bernoulli time series Ber_r, Ber_g, and Ber_b (see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Binomial.html/ see also https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Distributions.html/).

length <- 150                                # Setting the length of the time series
set.seed(12345)                              # Setting the random seed "12345" for reproducibility.
Ber_r <- rbinom(n=length, size=1, prob=0.5)  # Simulating the flips of a fair coin, by sampling 
                                             # from the standard Bernoulli distribution and storing
class(Ber_r)                                 # the simulation in the vector *Ber_r*.
## [1] "integer"
head(Ber_r, 30)                              # Showing the first 30 entries of the vector *Ber_r*.
##  [1] 1 1 1 1 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 0 1 1 0 0
set.seed(23451)   
Ber_g <- rbinom(n=length, size=1, prob=0.5)
head(Ber_g, 30)
##  [1] 1 0 1 1 1 0 1 0 0 0 1 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 0
set.seed(34512)  
Ber_b <- rbinom(n=length, size=1, prob=0.5) 
head(Ber_b, 30)
##  [1] 1 0 1 0 1 1 1 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 1 1 1 1 1 0 0 0

To get a better insight of the structure of the time series, we consider the scatter plots of the data sets Ber_r, Ber_g, and Ber_b against their indexing variable. To this, we begin with building a data frame Ber_df made by the indexing variable and the data sets.

# Building a data frame from the time series and the indexing variable.
Ber_df <- data.frame(t=1:length, Ber_r=Ber_r, Ber_g=Ber_g, Ber_b=Ber_b) 
head(Ber_df)    # Showing the initial part of the data frame.
##   t Ber_r Ber_g Ber_b
## 1 1     1     1     1
## 2 2     1     0     0
## 3 3     1     1     1
## 4 4     1     1     0
## 5 5     0     1     1
## 6 6     0     0     1
tail(Ber_df)    # Showing the final part of the data frame.
##       t Ber_r Ber_g Ber_b
## 145 145     0     0     0
## 146 146     1     1     1
## 147 147     1     1     0
## 148 148     0     1     1
## 149 149     1     0     1
## 150 150     0     0     1

Hence, we exploit the functions in the library ggplot2 to draw the scatter plot of the standard Bernoulli time series.

Data_df <- Ber_df
length <- nrow(Data_df)
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Scatter Plot of Three Standard Bernoulli Data Sets Against the Indexing Variable"))
subtitle_content <- bquote(paste("Data set size = ", .(length),~~"sample points;",~~~~"success parameter ",~ p == 0.5))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ k ~ "values")
x_breaks_num <- 15
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((x_breaks_up-max(x_breaks))>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth, x_breaks_up+J*x_binwidth)
y_name <- bquote(~ x[k] ~ "values")
y_breaks_num <- 2
y_max <- max(Data_df$Ber_r,Data_df$Ber_b,Data_df$Ber_g)
y_min <- min(Data_df$Ber_r,Data_df$Ber_b,Data_df$Ber_g)
y_binwidth <- round((y_max-y_min)/y_breaks_num, digits=3)
y_breaks_low <- floor(y_min/y_binwidth)*y_binwidth
y_breaks_up <- ceiling(y_max/y_binwidth)*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),digits=3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0.5
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("random seed" ~  12345)
col_2 <- bquote("random seed" ~  23451)
col_3 <- bquote("random seed" ~  34512)
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="red", "col_2"="green", "col_3"="blue")
leg_breaks <- c("col_1", "col_2", "col_3")

# library(ggplot2)
Ber_r_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Ber_r, color="col_1"), show.legend=FALSE) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0),
        axis.text.x=element_text(angle=0, vjust=1))

Ber_g_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Ber_g, color="col_2"), show.legend=FALSE) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  labs(subtitle=subtitle_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols) +
  theme(plot.subtitle=element_text(hjust=0), axis.text.x=element_text(angle=0, vjust=1))

Ber_b_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Ber_b, color="col_3")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols) +
  theme(plot.subtitle=element_text(hjust=0), axis.text.x=element_text(angle=0, vjust=1),
        legend.key.width=unit(0.8,"cm"), legend.position="bottom")

grid.arrange(Ber_r_sp,Ber_g_sp,Ber_b_sp, nrow=3, ncol=1, heights=c(0.36,0.28,0.36))

To save the plot in a .png or .pdf file.

# # Setting the work directory. 
# setwd("C:/Users/.../Documents/...")
# dir()
# 
# Saving the plot as .png file
# file_name = paste("Standard Bernoulli Time Series Scatter Plot.png", sep="")
# png(file_name, width=1600, height=800, res=120)
# print(grid.arrange(Ber_r_sp,Ber_g_sp,Ber_b_sp, nrow=3, ncol=1, heights=c(0.37,0.31,0.32)))
# dev.off()
# 
## Saving the plot as .pdf file
# file_name = paste("Standard Bernoulli Time Series Scatter Plot.png.pdf", sep="")
# pdf(file_name, width=12, height=7)
# print(grid.arrange(Ber_r_sp,Ber_g_sp,Ber_b_sp, nrow=3, ncol=1, heights=c(0.37,0.31,0.32)))
# dev.off()

Example 1.2 (Sample path of the standard Bernoulli counting process) Consider the standard Bernoulli time series \(\left(x_{t}\right)_{t=1}^{T}\) of length \(T\), for some \(T\in\mathbb{N}\), and set \[\begin{equation} y_{t}\overset{\text{def}}{=}\sum_{s=1}^{t}x_{s}, \quad\forall t\in\left\{1,\dots,T\right\}. \tag{1.2} \end{equation}\] Also \(\left(y_{t}\right)_{t=1}^{T}\) is a time series of points in \(\mathbb{R}\), such that \(y_{t}\) can be any number in \(\left\{0,1,\dots,t\right\}\), for every \(t=1,\dots,T\).

Definition 1.4 (Sample path of the standard Bernoulli counting process) We call a time series of the type presented in Example 1.2 a sample path of the standard Bernoulli counting process.

The standard Bernoulli counting process is a process with independent and identically distributed increments, More specifically it is a random walk with drift and no deterministic trend (see Definition 11.2).

We build three sample paths of the standard Bernoulli counting process from the three standard Bernoulli time series Ber_r, Ber_g, and Ber_b.

Ber_cp_r <- cumsum(Ber_r) # The command "cumsum(x)" generates the sequential sum 
Ber_cp_g <- cumsum(Ber_g) # of the entries of the vector "x".
Ber_cp_b <- cumsum(Ber_b)

We add the data sets Ber_cp_r, Ber_cp_g, and Ber_cp_b to the data frame Ber_df.

# library(tibble)
Ber_df <- add_column(Ber_df, Ber_cp_r=Ber_cp_r, Ber_cp_g=Ber_cp_g, Ber_cp_b=Ber_cp_b, .after="Ber_b")
head(Ber_df)
##   t Ber_r Ber_g Ber_b Ber_cp_r Ber_cp_g Ber_cp_b
## 1 1     1     1     1        1        1        1
## 2 2     1     0     0        2        1        1
## 3 3     1     1     1        3        2        2
## 4 4     1     1     0        4        3        2
## 5 5     0     1     1        4        4        3
## 6 6     0     0     1        4        4        4

We draw the scatter plots of the three sample paths, Ber_cp_r, Ber_cp_g, and Ber_cp_b, of the standard Bernoulli counting process.

Data_df <- Ber_df
length <- nrow(Data_df)
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Scatter Plot of Three Sample Paths of the Standard Bernoulli Counting Process Against the Indexing Variable"))
subtitle_content <- bquote(paste("Data set size = ", .(length),~~"sample points;",~~~~"success parameter ",~ p == 0.5))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
x_breaks_num <- 15
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((x_breaks_up-max(x_breaks))>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth,x_breaks_up+J*x_binwidth)
y_name <- bquote(~ x[t] ~ "values")
y_breaks_num <- 5
y_max <- max(Data_df$Ber_cp_r,Data_df$Ber_cp_g,Data_df$Ber_cp_b)
y_min <- min(Data_df$Ber_cp_r,Data_df$Ber_cp_g,Data_df$Ber_cp_b)
y_binwidth <- round((y_max-y_min)/y_breaks_num, digits=3)
y_breaks_low <- floor(y_min/y_binwidth)*y_binwidth
y_breaks_up <- ceiling(y_max/y_binwidth)*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),digits=3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("random seed" ~  12345)
col_2 <- bquote("random seed" ~  23451)
col_3 <- bquote("random seed" ~  34512)
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="red", "col_2"="green", "col_3"="blue")
leg_breaks <- c("col_1", "col_2", "col_3")
# library(ggplot2)
Ber_cp_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Ber_cp_r, color="col_1")) +
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Ber_cp_g, color="col_2")) +
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Ber_cp_b, color="col_3")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x=element_text(angle=0, vjust=1),
        legend.key.width=unit(0.8,"cm"), legend.position="bottom")
plot(Ber_cp_sp)

Example 1.3 (Standard Rademacher time series) We flip again a fair coin \(T\) times, for some \(T\in\mathbb{N}\), but this time we represent by \(1\) [resp. \(-1\)] the occurrence of the outcome heads [resp. tails] on the \(t\)th flip, for \(t=1,\dots,T\). We obtain a time series \(\left(x_{t}\right)_{t=1}^{T}\) of points in \(\mathbb{R}\), such that \(x_{t}=1\) or \(x_{t}=-1\), for every \(t=1,\dots,T\).

Definition 1.5 (Standard Rademacher time series) We call a time series of the type presented in Example 1.3 a standard Rademacher time series with success parameter \(p=1/2\).

We build three standard Rademacher time series Rad_r, Rad_g, and Rad_b.

length <- 150                                    
set.seed(12345)                                  
Rad_r <- 2*rbinom(n=length, size=1, prob=0.5)-1  # Simulating the flips of a Rademacher fair coin, 
                                                 # by sampling from the standard Bernoulli distribution
                                                 # and applying the rule R=2*B-1.
head(Rad_r, 30)                                  # Showing the first 30 entries of the vector *Rad_r*.
##  [1]  1  1  1  1 -1 -1 -1  1  1  1 -1 -1  1 -1 -1 -1 -1 -1 -1  1 -1 -1  1  1  1
## [26] -1  1  1 -1 -1
set.seed(23451)   
Rad_g <- 2*rbinom(n=length, size=1, prob=0.5)-1
head(Rad_g, 30)
##  [1]  1 -1  1  1  1 -1  1 -1 -1 -1  1 -1  1  1  1 -1 -1  1  1  1 -1 -1 -1 -1 -1
## [26] -1 -1  1  1 -1
set.seed(34512)  
Rad_b <- 2*rbinom(n=length, size=1, prob=0.5)-1 
head(Rad_b, 30)
##  [1]  1 -1  1 -1  1  1  1 -1 -1  1  1 -1 -1  1 -1 -1 -1  1 -1 -1  1 -1  1  1  1
## [26]  1  1 -1 -1 -1

We consider the scatter plots of the data sets Rad_r, Rad_g, and Rad_b against their indexing variable. To this, we start with building a data frame Rad_df made by the indexing variable and the data sets.

Rad_df <- data.frame(t=1:length, Rad_r=Rad_r, Rad_g=Rad_g, Rad_b=Rad_b) 
head(Rad_df)
##   t Rad_r Rad_g Rad_b
## 1 1     1     1     1
## 2 2     1    -1    -1
## 3 3     1     1     1
## 4 4     1     1    -1
## 5 5    -1     1     1
## 6 6    -1    -1     1

We draw the scatter plot of the standard Rademacher time series.

Data_df <- Rad_df
length <- nrow(Data_df)
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Scatter Plot of Three Standard Rademacher Time Serie Against the Indexing Variable"))
subtitle_content <- bquote(paste("Data set size = ", .(length),~~"sample points;",~~~~"success parameter ",~ p == 0.5))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
x_breaks_num <- 15
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((x_breaks_up-max(x_breaks))>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth,x_breaks_up+J*x_binwidth)
y_name <- bquote(~ x[t] ~ "values")
y_breaks_num <- 2
y_max <- max(Data_df$Rad_r,Data_df$Rad_b,Data_df$Rad_g)
y_min <- min(Data_df$Rad_r,Data_df$Rad_b,Data_df$Rad_g)
y_binwidth <- round((y_max-y_min)/y_breaks_num, digits=3)
y_breaks_low <- floor(y_min/y_binwidth)*y_binwidth
y_breaks_up <- floor(y_max/y_binwidth)*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),digits=3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0.5
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("random seed" ~  12345)
col_2 <- bquote("random seed" ~  23451)
col_3 <- bquote("random seed" ~  34512)
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="red", "col_2"="green", "col_3"="blue")
leg_breaks <- c("col_1", "col_2", "col_3")
# library(ggplot2)
Rad_r_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Rad_r, color="col_1"), show.legend=FALSE) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0),
        axis.text.x=element_text(angle=0, vjust=1))

Rad_g_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Rad_g, color="col_2"), show.legend=FALSE) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  labs(subtitle=subtitle_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.subtitle=element_text(hjust=0), axis.text.x=element_text(angle=0, vjust=1))

Rad_b_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Rad_b, color="col_3")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.subtitle=element_text(hjust=0), axis.text.x=element_text(angle=0, vjust=1),
        legend.key.width=unit(0.8,"cm"), legend.position="bottom")

grid.arrange(Rad_r_sp,Rad_g_sp,Rad_b_sp, nrow=3, ncol=1, heights=c(0.37,0.31,0.32))

In what follows, we will see that a standard Rademacher time series can be thought of as a sample path of the standard Rademacher process. The latter turns out to be a strong white noise process (see Definition 9.1.

Example 1.4 (Sample path of the standard Rademacher random walk) Consider the standard Rademacher time series \(\left(x_{t}\right)_{t=1}^{T}\) of length \(T\), for some \(T\in\mathbb{N}\), and set \[\begin{equation} y_{t}\overset{\text{def}}{=}\sum_{s=1}^{t}x_{s}, \quad\forall t\in\left\{1,\dots,T\right\}. \tag{1.3} \end{equation}\] Also \(\left(y_{t}\right)_{t=1}^{T}\) is a time series of points in \(\mathbb{R}\), such that \(y_{t}\) can be any number in \(\left\{-t,\dots,-1,0,1,\dots,t\right\}\), for every \(t=1,\dots,T\).

Definition 1.6 (Sample path of the standard Rademacher counting process) We call a time series of the type presented in Example 1.4 a sample path of the standard Rademacher counting process.

The standard Rademacher counting process is a process with independent and identically distributed increments, More specifically it is a random walk with no drift and no deterministic trend (see Definition 11.1). For this property it is also called the standard Rademacher random walk.

We build three sample paths of the Rademacher random walk from the three standard Rademacher time series Rad_r, Rad_g, and Rad_b.

Rad_rw_r <- cumsum(Rad_r)
Rad_rw_g <- cumsum(Rad_g)
Rad_rw_b <- cumsum(Rad_b)

We add the data sets Rad_rw_r, Rad_rw_g, and Rad_rw_b to the data frame Rad_df.

# library(tibble)
Rad_df <- add_column(Rad_df, Rad_rw_r=Rad_rw_r, Rad_rw_g=Rad_rw_g, Rad_rw_b=Rad_rw_b, .after="Rad_b")
head(Rad_df)
##   t Rad_r Rad_g Rad_b Rad_rw_r Rad_rw_g Rad_rw_b
## 1 1     1     1     1        1        1        1
## 2 2     1    -1    -1        2        0        0
## 3 3     1     1     1        3        1        1
## 4 4     1     1    -1        4        2        0
## 5 5    -1     1     1        3        3        1
## 6 6    -1    -1     1        2        2        2

We draw the scatter plots of the three sample paths, Rad_rw_r, Rad_rw_g, and Rad_rw_b, of the Rademacher counting process.

Data_df <- Rad_df
length <- nrow(Data_df)
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Scatter Plot of Three Sample Paths of the Standard Rademacher Random Walk Against the Indexing Variable"))
subtitle_content <- bquote(paste("Data set size = ", .(length),~~"sample points;",~~~~"success parameter ",~ p == 0.5))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
x_breaks_num <- 15
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((x_breaks_up-max(x_breaks))>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth,x_breaks_up+J*x_binwidth)
y_name <- bquote(~ x[t] ~ "values")
y_breaks_num <- 5
y_max <- max(Data_df$Rad_rw_r,Data_df$Rad_rw_g,Data_df$Rad_rw_b)
y_min <- min(Data_df$Rad_rw_r,Data_df$Rad_rw_g,Data_df$Rad_rw_b)
y_binwidth <- round((y_max-y_min)/y_breaks_num, digits=3)
y_breaks_low <- floor(y_min/y_binwidth)*y_binwidth
y_breaks_up <- ceiling(y_max/y_binwidth)*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),digits=3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("random seed" ~  12345)
col_2 <- bquote("random seed" ~  23451)
col_3 <- bquote("random seed" ~  34512)
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="red", "col_2"="green", "col_3"="blue")
leg_breaks <- c("col_1", "col_2", "col_3")
# library(ggplot2)
Rad_rw_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Rad_rw_r, color="col_1")) +
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Rad_rw_g, color="col_2")) +
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Rad_rw_b, color="col_3")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x=element_text(angle=0, vjust=1),
        legend.key.width=unit(0.8,"cm"), legend.position="bottom")
plot(Rad_rw_sp)

Example 1.5 (Binomial time series) We consider a urn containig \(W\) white balls and \(B\) black balls. We write \(p\equiv\frac{W}{W+B}\) and \(q\equiv\frac{B}{W+B}\). Note that \(q=1-p\). Fixed any \(T\in\mathbb{N}\), at each time \(t=1,\dots,T\), we draw \(n\) balls from the urn with replacement and write \(k\) for the number of whithe ball occurring in the draws. We obtain a time series \(\left(x_{t}\right)_{t=1}^{T}\) of points in \(\mathbb{R}\), such that \(x_{t}\) can be any number in \(\left\{0,1,\dots,n\right\}\) for every \(t=1,\dots,T\).

Definition 1.7 (Binomial time series) We call a time series of the type presented in Example 1.5 a binomial time series with number of trials parameter \(n\) and success parameter \(p\). In the particular case \(p=1/2\), the binomial time series is called standard.

In what follows, we will see that a binomial time series can be thought of as a sample path of a binomial process.

A binomial process turns out to be a process with independent and identically distributed components. In particular, it is a strong-sense stationary and wide-sense ergodic process (see Definitions 6.3 and 8.8).

We build three standard binomial time series bin_r, bin_g, and bin_b with number of trials parameter \(n=10\). (see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Binomial.html/ see also https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Distributions.html/).

length <- 150                                       # Setting the length of the time series.
trial_num <- 10                                     # Setting the number of trials parameters.
p <- 0.5                                            # Setting the success parameter.
set.seed(12345)                                     # Setting the random seed "12345" for reproducibility.
Bin_r <- rbinom(n=length, size=trial_num, prob=p)   # Simulating the draws of n balls from an urn 
                                                    # with replacement, by sampling from the standard binomial 
                                                    # distribution of "size" n.
head(Bin_r, 30)                                     # Showing the first 30 entries of the vector *Bin_r*.
##  [1] 6 7 6 7 5 3 4 5 6 9 2 3 6 1 5 5 5 5 4 8 5 4 8 6 6 5 6 5 4 5
set.seed(23451)   
Bin_g <- rbinom(n=length, size=trial_num, prob=p)
head(Bin_g, 30)
##  [1] 7 5 6 5 5 3 6 4 2 5 9 4 5 5 6 4 5 7 5 8 4 5 5 4 4 3 4 6 7 4
set.seed(34512)  
Bin_b <- rbinom(n=length, size=trial_num, prob=p) 
head(Bin_b, 30)
##  [1] 8 4 6 4 7 5 5 2 3 5 5 5 4 6 5 2 5 9 3 5 5 4 6 5 6 6 8 4 3 5

To get a better insight of the structure of the time series, we consider the scatter plots of the data sets Bin_r, bin_g, and bin_b against their indexing variable. To this, we begin with building a data frame Bin_df made by the indexing variable and the data sets.

# Building a data frame from the time series and the indexing variable.
Bin_df <- data.frame(t=1:length, Bin_r=Bin_r, Bin_g=Bin_g, Bin_b=Bin_b) 
head(Bin_df)    # Showing the initial part of the data frame.
##   t Bin_r Bin_g Bin_b
## 1 1     6     7     8
## 2 2     7     5     4
## 3 3     6     6     6
## 4 4     7     5     4
## 5 5     5     5     7
## 6 6     3     3     5
tail(Bin_df)    # Showing the final part of the data frame.
##       t Bin_r Bin_g Bin_b
## 145 145     3     4     2
## 146 146     8     5     7
## 147 147     9     7     4
## 148 148     5     6     7
## 149 149     7     4     6
## 150 150     4     5     6

Hence, we exploit the functions in the library ggplot2 to draw the scatter plot of the standard binomial time series with number of trials parameter \(n=10\).

Data_df <- Bin_df
length <- nrow(Data_df)
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Scatter Plot of Three Standard Binomial Time Serie Against the Indexing Variable"))
subtitle_content <- bquote(paste("Data set size = ", .(length),~~"sample points;",~~~~"number of trials parameter ",~ n == 10,~~~~"success parameter ",~ p == 0.5,~"."))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
x_breaks_num <- 15
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((x_breaks_up-max(x_breaks))>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth,x_breaks_up+J*x_binwidth)
y_name <- bquote(~ x[t] ~ "values")
y_breaks_num <- 3
y_max <- max(Data_df$Bin_r,Data_df$Bin_b,Data_df$Bin_g)
y_min <- min(Data_df$Bin_r,Data_df$Bin_b,Data_df$Bin_g)
y_binwidth <- round((y_max-y_min)/y_breaks_num, digits=3)
y_breaks_low <- floor(y_min/y_binwidth)*y_binwidth
y_breaks_up <- floor(y_max/y_binwidth)*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),digits=3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0.5
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("random seed" ~  12345)
col_2 <- bquote("random seed" ~  23451)
col_3 <- bquote("random seed" ~  34512)
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="red", "col_2"="green", "col_3"="blue")
leg_breaks <- c("col_1", "col_2", "col_3")
# library(ggplot2)
Bin_r_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Bin_r, color="col_1"), show.legend=FALSE) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0),
        axis.text.x=element_text(angle=0, vjust=1))

Bin_g_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Bin_g, color="col_2"), show.legend=FALSE) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  labs(subtitle=subtitle_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.subtitle=element_text(hjust=0), axis.text.x=element_text(angle=0, vjust=1))

Bin_b_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Bin_b, color="col_3")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.subtitle=element_text(hjust=0), axis.text.x=element_text(angle=0, vjust=1),
  legend.key.width=unit(0.8,"cm"), legend.position="bottom")

grid.arrange(Bin_r_sp,Bin_g_sp,Bin_b_sp, nrow=3, ncol=1, heights=c(0.37,0.31,0.32))

Example 1.6 (Sample path of the standard binomial counting process) Consider the standard binomial time series \(\left(x_{t}\right)_{t=1}^{T}\), with number of trials parameter \(n\), of length \(T\), for some \(T\in\mathbb{N}\), and set \[\begin{equation} y_{t}\overset{\text{def}}{=}\sum_{s=1}^{t}x_{s}, \quad\forall t\in\left\{1,\dots,T\right\}. \tag{1.4} \end{equation}\] Also \(\left(y_{t}\right)_{t=1}^{T}\) is a time series of points in \(\mathbb{R}\), such that \(y_{t}\) can be any number in \(\left\{0,1,\dots,nt\right\}\), for every \(t=1,\dots,T\).

Definition 1.8 (Sample path of the standard binomial counting process) We call a time series of the type presented in Example 1.6 a sample path of the standard binomial counting process with number of trials parameter \(n=10\).

A standard binomial counting process with number of trials parameter \(n\), for some \(n\in\mathbf{N}\), is a process with independent and identically distributed increments, More specifically it is a random walk with drift and no deterministic trend (see Definition 11.2).

We build three sample paths of the standard binomial counting process with number of trials parameter \(n=10\) from the three standard binomial time series Bin_r, Bin_g, and Bin_b.

Bin_rw_r <- cumsum(Bin_r) # The command "cumsum(x)" generates the sequential sum 
Bin_rw_g <- cumsum(Bin_g) # of the entries of the vector "x".
Bin_rw_b <- cumsum(Bin_b)

We add the data sets Bin_rw_r, Bin_rw_g, and Bin_rw_b to the data frame Bin_df.

# library(tibble)
Bin_df <- add_column(Bin_df, Bin_rw_r=Bin_rw_r, Bin_rw_g=Bin_rw_g, Bin_rw_b=Bin_rw_b, .after="Bin_b")
head(Bin_df)
##   t Bin_r Bin_g Bin_b Bin_rw_r Bin_rw_g Bin_rw_b
## 1 1     6     7     8        6        7        8
## 2 2     7     5     4       13       12       12
## 3 3     6     6     6       19       18       18
## 4 4     7     5     4       26       23       22
## 5 5     5     5     7       31       28       29
## 6 6     3     3     5       34       31       34

We draw the scatter plots of the three sample paths, Bin_rw_r, Bin_rw_g, and Bin_rw_b, of the standard binomial counting process.

Data_df <- Bin_df
length <- nrow(Data_df)
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Scatter Plot of Three Sample Paths of the Standard Binomial Counting Process Against the Indexing Variable"))
subtitle_content <- bquote(paste("Data set size = ", .(length),~~"sample points;",~~~~"number of trials parameter ",~ n == 10,~~~~"success parameter ",~ p == 0.5,~"."))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
x_breaks_num <- 15
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((x_breaks_up-max(x_breaks))>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth,x_breaks_up+J*x_binwidth)
y_name <- bquote(~ x[t] ~ "values")
y_breaks_num <- 5
y_max <- max(Data_df$Bin_rw_r,Data_df$Bin_rw_g,Data_df$Bin_rw_b)
y_min <- min(Data_df$Bin_rw_r,Data_df$Bin_rw_g,Data_df$Bin_rw_b)
y_binwidth <- round((y_max-y_min)/y_breaks_num, digits=3)
y_breaks_low <- floor(y_min/y_binwidth)*y_binwidth
y_breaks_up <- ceiling(y_max/y_binwidth)*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),digits=3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("random seed" ~  12345)
col_2 <- bquote("random seed" ~  23451)
col_3 <- bquote("random seed" ~  34512)
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="red", "col_2"="green", "col_3"="blue")
leg_breaks <- c("col_1", "col_2", "col_3")
# library(ggplot2)
Bin_rw_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Bin_rw_r, color="col_1")) +
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Bin_rw_g, color="col_2")) +
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Bin_rw_b, color="col_3")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x=element_text(angle=0, vjust=1),
        legend.key.width=unit(0.8,"cm"), legend.position="bottom")
plot(Bin_rw_sp)

Example 1.7 (Standard Poisson time series) Assume we fix a unit of time such that, on the average, we get one message from our favorite messenger program in the unit time interval. Fixed any \(T\in\mathbb{N}\), we consider \(T\) unit time intervals and write \(x_{t}\) for the number of messages that we get in the \(t\)th time interval, for \(t=1,\dots,T\). We obtain a time series \(\left(x_{t}\right)_{t=1}^{T}\) of points in \(\mathbb{R}\), such that \(x_{t}\) can be any positive integer, for every \(t=1,\dots,T\).

Definition 1.9 (Standard Poisson time series) We call a time series of the type presented in Example 1.7 a standard Poisson time series.

In what follows, we will see that a standard Poisson time series can be thought of as a sample path of the standard Poisson process with rate parameter \(\lambda=1\).

We build three standard Poisson time series Poiss_r, Poiss_g, and Poiss_b (see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Poisson.html/ see also https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Distributions.html/).

length <- 150                          
set.seed(12345)                        
Poiss_r <- rpois(n=length, lambda=1)   # Simulating the arrival of messages by sampling from 
                                       # the standard Poisson distribution.
head(Poiss_r, 30)                      # Showing the first 30 entries of the vector *Poiss_r*.
##  [1] 1 2 2 2 1 0 0 1 1 4 0 0 1 0 1 1 1 1 0 3 1 0 3 1 1 1 1 1 0 1
set.seed(23451)   
Poiss_g <- rpois(n=length, lambda=1)
head(Poiss_g, 30)
##  [1] 2 1 1 1 1 0 1 1 0 1 4 0 1 1 2 0 1 2 1 4 0 1 1 0 0 0 0 1 3 0
set.seed(34512)  
Poiss_b <- rpois(n=length, lambda=1) 
head(Poiss_b, 30)
##  [1] 3 0 1 0 2 1 1 0 0 1 1 1 0 1 1 0 1 5 0 1 1 0 2 1 1 2 3 0 0 1

We consider the scatter plots of the data sets Poiss_r, Poiss_g and Poiss_b against their indexing variable. To this, we start with building a data frame Poiss_df containing the indexing variable and the data sets.

# Building a data frame from the time series and the indexing variable.
Poiss_df <- data.frame(t=1:length, Poiss_r=Poiss_r, Poiss_g=Poiss_g, Poiss_b=Poiss_b) 

head(Poiss_df)    # Showing the initial part of the data frame.
##   t Poiss_r Poiss_g Poiss_b
## 1 1       1       2       3
## 2 2       2       1       0
## 3 3       2       1       1
## 4 4       2       1       0
## 5 5       1       1       2
## 6 6       0       0       1
tail(Poiss_df)    # Showing the final part of the data frame.
##       t Poiss_r Poiss_g Poiss_b
## 145 145       0       0       0
## 146 146       3       1       3
## 147 147       4       3       0
## 148 148       1       1       2
## 149 149       2       0       2
## 150 150       1       1       2

Hence, we can exploit the command ggplot to draw the scatter plot of the standard Poisson time series.

Data_df <- Poiss_df
length <- nrow(Data_df)
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Scatter Plot of the Standard Poisson Time Series Against the Indexing Variable"))
subtitle_content <- bquote(paste("Data set size = ", .(length),~~"sample points;",~~~~"rate parameter ",~ lambda == 1))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
x_breaks_num <- 15
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((x_breaks_up-max(x_breaks))>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth,x_breaks_up+J*x_binwidth)
y_name <- bquote(~ x[t] ~ "values")
y_breaks_num <- 5
y_max <- max(Data_df$Poiss_r,Data_df$Poiss_g,Data_df$Poiss_b)
y_min <- min(Data_df$Poiss_r,Data_df$Poiss_g,Data_df$Poiss_b)
y_binwidth <- round((y_max-y_min)/y_breaks_num, digits=3)
y_breaks_low <- floor(y_min/y_binwidth)*y_binwidth
y_breaks_up <- ceiling(y_max/y_binwidth)*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),digits=3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0.5
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("random seed" ~  12345)
col_2 <- bquote("random seed" ~  23451)
col_3 <- bquote("random seed" ~  34512)
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="red", "col_2"="green", "col_3"="blue")
leg_breaks <- c("col_1", "col_2", "col_3")
# library(ggplot2)
Poiss_r_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Poiss_r, color="col_1"), show.legend=FALSE) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0),
        axis.text.x=element_text(angle=0, vjust=1))

Poiss_g_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Poiss_g, color="col_2"), show.legend=FALSE) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  labs(subtitle=subtitle_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.subtitle=element_text(hjust=0), axis.text.x=element_text(angle=0, vjust=1))

Poiss_b_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Poiss_b, color="col_3")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.subtitle=element_text(hjust=0), axis.text.x=element_text(angle=0, vjust=1),
        legend.key.width=unit(0.8,"cm"), legend.position="bottom")

grid.arrange(Poiss_r_sp,Poiss_g_sp,Poiss_b_sp, nrow=3, ncol=1, heights=c(0.37,0.31,0.32))

Example 1.8 (Sample path of the standard Poisson counting process) Consider the standard Poisson time series \(\left(x_{t}\right)_{t=1}^{T}\) of length \(T\), for some \(T\in\mathbb{N}\), and set \[\begin{equation} y_{t}\overset{\text{def}}{=}\sum_{s=1}^{t}x_{s}, \quad\forall t\in\left\{0,1,\dots,T\right\}. \tag{1.5} \end{equation}\] Also \(\left(y_{t}\right)_{t=1}^{T}\) is a time series of points in \(\mathbb{R}\), such that \(y_{t}\) can be any positive integer, for every \(t=1,\dots,T\).

Definition 1.10 (Sample path of the standard Poisson counting process) We call a time series of the type presented in Example 1.8 a sample path of the standard Poisson counting process with rate parameter \(\lambda=1\).

A standard Poisson counting process with rate parameter \(\lambda\), for some \(\lambda\in\mathbf{R}_{++}\), is a process with independent and identically distributed increments, More specifically it is a random walk with drift and no deterministic trend (see Definition 11.2).

We build three sample paths of the standard Poisson counting process from the three standard Poisson time series Poiss_r, Poiss_g, and Poiss_b.

Poiss_rw_r <- cumsum(Poiss_r)
Poiss_rw_g <- cumsum(Poiss_g)
Poiss_rw_b <- cumsum(Poiss_b)

We add the data sets Poiss_rw_r, Poiss_rw_g, and Poiss_rw_b to the data set Poiss_df.

# library(tibble)
Poiss_df <- add_column(Poiss_df, Poiss_rw_r=Poiss_rw_r, Poiss_rw_g=Poiss_rw_g, Poiss_rw_b=Poiss_rw_b, .after="Poiss_b")
head(Poiss_df)
##   t Poiss_r Poiss_g Poiss_b Poiss_rw_r Poiss_rw_g Poiss_rw_b
## 1 1       1       2       3          1          2          3
## 2 2       2       1       0          3          3          3
## 3 3       2       1       1          5          4          4
## 4 4       2       1       0          7          5          4
## 5 5       1       1       2          8          6          6
## 6 6       0       0       1          8          6          7

We draw the scatter plots of the three sample paths of the Poisson counting process Poiss_rw_r, Poiss_rw_g, and Poiss_rw_b.

Data_df <- Poiss_df
length <- nrow(Data_df)
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023","Scatter Plot of Three Sample Paths of the standard Poisson Counting Process Against the Indexing Variable"))
subtitle_content <- bquote(paste("Data set size = ", .(length),~~"sample points;",~~~~"rate parameter ",~ lambda == 1))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
x_breaks_num <- 15
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((x_breaks_up-max(x_breaks))>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth,x_breaks_up+J*x_binwidth)
y_name <- bquote(~ x[t] ~ "values")
y_breaks_num <- 5
y_max <- max(Data_df$Poiss_rw_b,Data_df$Poiss_rw_r,Data_df$Poiss_rw_g)
y_min <- min(Data_df$Poiss_rw_b,Data_df$Poiss_rw_r,Data_df$Poiss_rw_g)
y_binwidth <- round((y_max-y_min)/y_breaks_num, digits=3)
y_breaks_low <- floor(y_min/y_binwidth)*y_binwidth
y_breaks_up <- ceiling(y_max/y_binwidth)*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),digits=3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("random seed" ~  12345)
col_2 <- bquote("random seed" ~  23451)
col_3 <- bquote("random seed" ~  34512)
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="red", "col_2"="green", "col_3"="blue")
leg_breaks <- c("col_1", "col_2", "col_3")
# library(ggplot2)
Poiss_rw_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Poiss_rw_r, color="col_1")) +
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Poiss_rw_g, color="col_2")) +
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Poiss_rw_b, color="col_3")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x=element_text(angle=0, vjust=1),
        legend.key.width=unit(0.8,"cm"), legend.position="bottom")
plot(Poiss_rw_sp)

Example 1.9 (Standard Gaussian time series) We randomly sample a number from the standard Gaussian distribution \(T\) times, for some \(T\in\mathbb{N}\). We obtain a time series \(\left(x_{t}\right)_{t=1}^{T}\) of points in \(\mathbb{R}\), such that \(x_{t}\) can be any real number.

Definition 1.11 (Standard Gaussian time series) We call a time series of the type presented in Example 1.9 a standard Gaussian time series.

In what follows, we will see that a standard Gaussian time series can be thought of as a sample path of the standard Gaussian process. The latter turns out to be a strong white noise process (see Definition 9.1.

We build three standard Gaussian time series Gauss_r, Gauss_g, and Gauss_b (see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Normal.html/ see also https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Distributions.html/).

length <- 150                               # Setting the length of the time series.
set.seed(12345)                             # Setting the random seed "12345" for reproducibility.
Gauss_r <- rnorm(n=length, mean=0, sd=1)    # Simulating the sampling from a Gaussian distribution.
head(Gauss_r, 30)                           # Showing the first 30 entries of the vector *Gauss_r*.
##  [1]  0.5855288  0.7094660 -0.1093033 -0.4534972  0.6058875 -1.8179560
##  [7]  0.6300986 -0.2761841 -0.2841597 -0.9193220 -0.1162478  1.8173120
## [13]  0.3706279  0.5202165 -0.7505320  0.8168998 -0.8863575 -0.3315776
## [19]  1.1207127  0.2987237  0.7796219  1.4557851 -0.6443284 -1.5531374
## [25] -1.5977095  1.8050975 -0.4816474  0.6203798  0.6121235 -0.1623110
set.seed(23451)   
Gauss_g <- rnorm(n=length, mean=0, sd=1)
head(Gauss_g, 30)
##  [1]  1.22025171  0.51161039  0.29333571  0.41110479 -2.19280162  2.43083966
##  [7]  0.17980768  0.90902387 -0.07787957  0.27788769 -0.47136488 -0.15080225
## [13] -0.70588028 -0.48866578  1.59742012 -0.96274062  0.57960429 -0.49276176
## [19] -0.09504695 -0.81029678  1.24586718 -0.15172606 -1.91062259 -1.22215123
## [25]  1.33198075  0.32565982 -0.18463640  1.06997972 -0.67972803  0.83826051
set.seed(34512)  
Gauss_b <- rnorm(n=length, mean=0, sd=1) 
head(Gauss_b, 30)
##  [1]  1.81710386  0.55161637  1.11137544  0.13105810 -0.98000415  0.19093294
##  [7] -0.40066585 -0.01188588 -0.29639708 -1.27366316  0.29692993  0.85182847
## [13]  0.59033471  1.97486450 -1.33984935 -0.78122934  0.11834652 -1.05751290
## [19]  0.74990830  0.73138780  0.23231437 -0.19755703  0.63961470 -0.19329883
## [25] -1.19057725 -0.42250101 -0.70632371 -0.50936601 -0.03560262  0.47006425

We consider the scatter plots of the data sets Gauss_r, Gauss_g and Gauss_b against their indexing variable. We start with building a data frame Gauss_df containing the data sets and the indexing variable itself.

# Building a data frame from the time series and the indexing variable.
Gauss_df <- data.frame(t=1:length, Gauss_r=Gauss_r, Gauss_g=Gauss_g, Gauss_b=Gauss_b) 

head(Gauss_df)    # Showing the initial part of the data frame.
##   t    Gauss_r    Gauss_g    Gauss_b
## 1 1  0.5855288  1.2202517  1.8171039
## 2 2  0.7094660  0.5116104  0.5516164
## 3 3 -0.1093033  0.2933357  1.1113754
## 4 4 -0.4534972  0.4111048  0.1310581
## 5 5  0.6058875 -2.1928016 -0.9800041
## 6 6 -1.8179560  2.4308397  0.1909329
tail(Gauss_df)    # Showing the final part of the data frame.
##       t     Gauss_r    Gauss_g    Gauss_b
## 145 145  0.01585569  0.5995403  1.2710036
## 146 146  0.54016957 -1.5941019 -0.9912820
## 147 147 -1.54729197 -0.7017336 -0.6789599
## 148 148  0.84965293  0.2423192 -0.1396153
## 149 149  0.89601318  0.7225111  1.4802117
## 150 150  0.13869100 -0.3104763  1.2438692

Hence, we draw the scatter plot of the standard Gaussian time series.

Data_df <- Gauss_df
length <- nrow(Data_df)
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Scatter Plot of the Gaussian Time Series Against the Indexing Variable"))
subtitle_content <- bquote(paste("Data set size = ", .(length),~~"sample points;", ~~~~  "Mean parameter" ~ mu==0  ~~~~ "Variance parameter" ~ sigma^2==1))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
x_breaks_num <- 15
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((x_breaks_up-max(x_breaks))>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth,x_breaks_up+J*x_binwidth)
y_name <- bquote(~ x[t] ~ "values")
y_breaks_num <- 5
y_max <- max(Data_df$Gauss_r,Data_df$Gauss_g,Data_df$Gauss_b)
y_min <- min(Data_df$Gauss_r,Data_df$Gauss_g,Data_df$Gauss_b)
y_binwidth <- round((y_max-y_min)/y_breaks_num, digits=3)
y_breaks_low <- floor(y_min/y_binwidth)*y_binwidth
y_breaks_up <- floor(y_max/y_binwidth)*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),digits=3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 1
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("random seed" ~  12345)
col_2 <- bquote("random seed" ~  23451)
col_3 <- bquote("random seed" ~  34512)
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="red", "col_2"="green", "col_3"="blue")
leg_breaks <- c("col_1", "col_2", "col_3")
# library(ggplot2)
Gauss_r_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Gauss_r, color="col_1"), show.legend=FALSE) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0),
        axis.text.x=element_text(angle=0, vjust=1))

Gauss_g_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Gauss_g, color="col_2"), show.legend=FALSE) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  labs(subtitle=subtitle_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.subtitle=element_text(hjust=0), axis.text.x=element_text(angle=0, vjust=1))

Gauss_b_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Gauss_b, color="col_3")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.subtitle=element_text(hjust=0), axis.text.x=element_text(angle=0, vjust=1),
        legend.key.width=unit(0.8,"cm"), legend.position="bottom")

grid.arrange(Gauss_r_sp,Gauss_g_sp,Gauss_b_sp, nrow=3, ncol=1, heights=c(0.37,0.31,0.32))

Example 1.10 (Sample path of the standard Gaussian random walk) Consider the standard Gaussian time series \(\left(x_{t}\right)_{t=1}^{T}\) e set \[\begin{equation} y_{t}\overset{\text{def}}{=}\sum_{s=1}^{t}x_{s},\quad\forall t\in\left\{1,\dots,T\right\}. \tag{1.6} \end{equation}\] Also \(\left(y_{t}\right)_{t=1}^{T}\) is a time series of points in \(\mathbb{R}\), such that \(y_{t}\) can be any real number, for every \(t=1,\dots,T\).

Definition 1.12 (Sample path of the standard Gaussian random walk) We call a time series of the type presented in Example 1.10 a sample path of the standard Gaussian counting process.

The standard Gaussian counting process is a process with independent and identically distributed increments, More specifically it is a random walk with no drift and no deterministic trend (see Definition 11.1). For this property it is also called the standard Gaussian random walk.

We build three sample paths of the standard Gaussian random walk from the three standard Gaussian time series Gauss_r, Gauss_g, and Gauss_b.

Gauss_rw_r <- cumsum(Gauss_r)
Gauss_rw_g <- cumsum(Gauss_g)
Gauss_rw_b <- cumsum(Gauss_b)

We add the data sets Gauss_rw_r, Gauss_rw_g, and Gauss_rw_b to the data set Gauss_df.

# library(tibble)
Gauss_df <- add_column(Gauss_df, Gauss_rw_r=Gauss_rw_r, Gauss_rw_g=Gauss_rw_g, Gauss_rw_b=Gauss_rw_b, .after="Gauss_b")
head(Gauss_df)
##   t    Gauss_r    Gauss_g    Gauss_b Gauss_rw_r Gauss_rw_g Gauss_rw_b
## 1 1  0.5855288  1.2202517  1.8171039  0.5855288   1.220252   1.817104
## 2 2  0.7094660  0.5116104  0.5516164  1.2949948   1.731862   2.368720
## 3 3 -0.1093033  0.2933357  1.1113754  1.1856915   2.025198   3.480096
## 4 4 -0.4534972  0.4111048  0.1310581  0.7321943   2.436303   3.611154
## 5 5  0.6058875 -2.1928016 -0.9800041  1.3380818   0.243501   2.631150
## 6 6 -1.8179560  2.4308397  0.1909329 -0.4798742   2.674341   2.822083

We draw the scatter plots of the three sample paths, Gauss_rw_r, Gauss_rw_g, and Gauss_rw_b, of the standard Gaussian random walk.

Data_df <- Gauss_df
length <- nrow(Data_df)
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Scatter Plot of Three Sample Paths of the standard Gaussian Random Walk Against the Indexing Variable"))
subtitle_content <- bquote(paste("Data set size = ", .(length),~~"sample points;", ~~~~  "Mean parameter" ~ mu==0  ~~~~ "Variance parameter" ~ sigma^2==1))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
x_breaks_num <- 15
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((x_breaks_up-max(x_breaks))>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth,x_breaks_up+J*x_binwidth)
y_name <- bquote(~ x[t] ~ "values")
y_breaks_num <- 5
y_max <- max(Data_df$Gauss_rw_r,Data_df$Gauss_rw_g,Data_df$Gauss_rw_b)
y_min <- min(Data_df$Gauss_rw_r,Data_df$Gauss_rw_g,Data_df$Gauss_rw_b)
y_binwidth <- round((y_max-y_min)/y_breaks_num, digits=3)
y_breaks_low <- floor(y_min/y_binwidth)*y_binwidth
y_breaks_up <- ceiling(y_max/y_binwidth)*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),digits=3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("random seed" ~  12345)
col_2 <- bquote("random seed" ~  23451)
col_3 <- bquote("random seed" ~  34512)
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="red", "col_2"="green", "col_3"="blue")
leg_breaks <- c("col_1", "col_2", "col_3")
# library(ggplot2)
Gauss_rw_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Gauss_rw_r, color="col_1")) +
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Gauss_rw_g, color="col_2")) +
  geom_point(alpha=1, size=1.5, shape=19, aes(y=Gauss_rw_b, color="col_3")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x=element_text(angle=0, vjust=1),
        legend.key.width=unit(0.8,"cm"), legend.position="bottom")
plot(Gauss_rw_sp)

In this case we consider also a line plot.

title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Line Plot of Three Sample Paths of the standard Gaussian Random Walk Against the Indexing Variable"))
Gauss_rw_lp <- ggplot(Data_df, aes(x=t)) + 
  geom_line(alpha=1, size=0.8, aes(y=Gauss_rw_r, color="col_1"), group=1) +
  geom_line(alpha=1, size=0.8, aes(y=Gauss_rw_g, color="col_2"), group=1) +
  geom_line(alpha=1, size=0.8, aes(y=Gauss_rw_b, color="col_3"), group=1) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_labs) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        legend.position="bottom")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
plot(Gauss_rw_lp)

Now, we consider some real time series.

First, the annual time series of the Italian Public Debt to Gross Domestic Product from \(1861\) to \(2000\). Thanks to Osservatorio sui Conti Pubblici Italiani - Università Cattolica del Sacro Cuore (see https://osservatoriocpi.unicatt.it/cpi-archivio-studi-e-analisi-i-numeri-della-finanza-pubblica-dal-1861-a-oggi/).

We download the file cpi-cpi-SERIE_STORICHE_aggiornamento_28-01-2020.xlsx, then we save it as .csv file and, with some manipulation, we create the Pub_Deb.csv file, to be read as a data.frame object in R.

# Reading data in a data.frame object from a .csv file
Pub_Deb_df <- read.csv("C:/Users/rober/My Documents - Notebook (local)/My Classes/MPSMF/R - Scripts & Data/Data/Pub_Deb.csv", header=TRUE)
head(Pub_Deb_df)
##   t Year   DPR DSP  SI
## 1 1 1861 38.16  NA  NA
## 2 2 1862 39.90 1.6 1.6
## 3 3 1863 47.30 2.1 2.1
## 4 4 1864 58.40 2.4 2.4
## 5 5 1865 63.00 2.9 2.9
## 6 6 1866 67.10 2.8 2.8
str(Pub_Deb_df)
## 'data.frame':    160 obs. of  5 variables:
##  $ t   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Year: int  1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 ...
##  $ DPR : num  38.2 39.9 47.3 58.4 63 ...
##  $ DSP : num  NA 1.6 2.1 2.4 2.9 2.8 3.1 2.8 3.5 4 ...
##  $ SI  : num  NA 1.6 2.1 2.4 2.9 2.8 3.1 2.8 3.5 4 ...

Then we consider the scatter plot of the DPR column against the Year column.

Data_df <- Pub_Deb_df
length <- nrow(Data_df)
First_Date <- paste(Data_df$Year[1])
Last_Date <- paste(Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Scatter Plot of the Annual Debt-GDP Ratio from ", .(First_Date), " to ", .(Last_Date), sep="")))
subtitle_content <- bquote(paste("sample path length ", .(length), " sample points. By courtesy of Osservatorio sui Conti Pubblici Italiani"))
caption_content <- "Author: Roberto Monte"
x_name <- bquote("")
x_breaks_num <- 16
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((x_breaks_up-max(x_breaks))>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- as.character(Data_df$Year[x_breaks])
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth,x_breaks_up+J*x_binwidth)
y_name <- bquote("Debt to GDP")
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$DPR)-min(Data_df$DPR))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$DPR)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$DPR)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0.5
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("Debt-GDP")
col_2 <- bquote("LOESS curve")
col_3 <- bquote("regression line")
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="blue", "col_2"="red", "col_3"="green3")
leg_breaks <- c("col_1", "col_2", "col_3")
DPR_sp <- ggplot(Data_df) +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=DPR, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=DPR, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_point(alpha=1, size=1.5, shape=19, aes(x=t, y=DPR, color="col_1")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks,
                      guide=guide_legend(override.aes=list(shape=c(19,NA,NA), 
                                                           linetype=c("blank", "dashed", "solid")))) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=0, vjust=1),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(DPR_sp)

The line plot.

title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Line Plot of the Annual Debt-GDP Ratio From", .(First_Date), " to ", .(Last_Date), sep="")))
DPR_lp <- ggplot(Data_df) +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=DPR, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=DPR, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_line(alpha=1, size=0.8, linetype="solid", aes(x=t, y=DPR, color="col_1", group=1)) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks,
                      guide=guide_legend(override.aes=list(linetype=c("solid", "dashed", "solid")))) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=0, vjust=1),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(DPR_lp)

As another example of real time series we consider the market value of the Alphabet Inc. an American multinational conglomerate listed at the New York Stock Exchange, which is the parent company of Google and several Google subsidiaries.

We download the data from Yahoo Finance and create the GOOGLE - 2016-05-16 - 2021-05-13.csv file, to be read as a data.frame object in R.

# We build a data frame with daily Google stock data from the "GOOGLE - 2016-05-16 - 2021-05-13.csv" file.
Google_df <- read.csv("C:/Users/rober/My Documents - Notebook (local)/My Classes/MPSMF/R - Scripts & Data/Data/GOOGLE - 2016-05-16 - 2021-05-13.csv", header=TRUE)
class(Google_df)
## [1] "data.frame"
show(Google_df[1:15,])
##          Date   Open    High     Low  Close Adj.Close  Volume
## 1  2016-05-16 709.13 718.480 705.650 716.49    716.49 1317100
## 2  2016-05-17 715.99 721.520 704.110 706.23    706.23 2001200
## 3  2016-05-18 703.67 711.600 700.630 706.63    706.63 1766800
## 4  2016-05-19 702.36 706.000 696.800 700.32    700.32 1670200
## 5  2016-05-20 701.62 714.580 700.520 709.74    709.74 1828400
## 6  2016-05-23 706.53 711.478 704.180 704.24    704.24 1330700
## 7  2016-05-24 706.86 720.970 706.860 720.09    720.09 1929500
## 8  2016-05-25 720.76 727.510 719.705 725.27    725.27 1629200
## 9  2016-05-26 722.87 728.330 720.280 724.12    724.12 1576300
## 10 2016-05-27 724.01 733.936 724.000 732.66    732.66 1975000
## 11 2016-05-31 731.74 739.730 731.260 735.72    735.72 2129500
## 12 2016-06-01 734.53 737.210 730.660 734.15    734.15 1253600
## 13 2016-06-02 732.50 733.020 724.170 730.40    730.40 1341800
## 14 2016-06-03 729.27 729.490 720.560 722.34    722.34 1226300
## 15 2016-06-06 724.91 724.910 714.610 716.55    716.55 1565300
# We check whether the Date column is in "Date" format. In case it is not, we change the format to "Date".
class(Google_df$Date)
## [1] "character"
Google_df$Date <- as.Date(Google_df$Date, format="%Y-%m-%d")
class(Google_df$Date)
## [1] "Date"
# We add and index column
Google_df <- add_column(Google_df, t=1:nrow(Google_df), .before="Date")
show(Google_df[1:15,])
##     t       Date   Open    High     Low  Close Adj.Close  Volume
## 1   1 2016-05-16 709.13 718.480 705.650 716.49    716.49 1317100
## 2   2 2016-05-17 715.99 721.520 704.110 706.23    706.23 2001200
## 3   3 2016-05-18 703.67 711.600 700.630 706.63    706.63 1766800
## 4   4 2016-05-19 702.36 706.000 696.800 700.32    700.32 1670200
## 5   5 2016-05-20 701.62 714.580 700.520 709.74    709.74 1828400
## 6   6 2016-05-23 706.53 711.478 704.180 704.24    704.24 1330700
## 7   7 2016-05-24 706.86 720.970 706.860 720.09    720.09 1929500
## 8   8 2016-05-25 720.76 727.510 719.705 725.27    725.27 1629200
## 9   9 2016-05-26 722.87 728.330 720.280 724.12    724.12 1576300
## 10 10 2016-05-27 724.01 733.936 724.000 732.66    732.66 1975000
## 11 11 2016-05-31 731.74 739.730 731.260 735.72    735.72 2129500
## 12 12 2016-06-01 734.53 737.210 730.660 734.15    734.15 1253600
## 13 13 2016-06-02 732.50 733.020 724.170 730.40    730.40 1341800
## 14 14 2016-06-03 729.27 729.490 720.560 722.34    722.34 1226300
## 15 15 2016-06-06 724.91 724.910 714.610 716.55    716.55 1565300
# As a help to determine a good number of tickers on the x axis
# library("numbers")
primeFactors(nrow(Google_df))
## [1]  2 17 37

We draw a scatter plot of the daily Google adjusted close.

Data_df <- Google_df
length <- length(na.omit(Data_df$Adj.Close))
First_Date <- paste(Data_df$Date[1])
Last_Date <- paste(Data_df$Date[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Scatter Plot of the Alphabet Inc. Adjusted Close from ", .(First_Date), " to ", .(Last_Date), sep="")))
link <-  "https://finance.yahoo.com/quote/GOOG?p=GOOG"
subtitle_content <- bquote(paste("sample path length ", .(length), " sample points. By courtesy of Yahoo Finance  -  ", .(link)))
caption_content <- "Author: Roberto Monte"
x_name <- bquote("")
x_breaks_num <- 17
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((x_breaks_up-max(x_breaks))>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- as.character(Data_df$Date[x_breaks])
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth,x_breaks_up+J*x_binwidth)
y_name <- bquote("adjusted close")
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$Adj.Close)-min(Data_df$Adj.Close))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$Adj.Close)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$Adj.Close)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0.5
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("Alphabet adj. close")
col_2 <- bquote("LOESS curve")
col_3 <- bquote("regression line")
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="blue", "col_2"="red", "col_3"="green3")
leg_breaks <- c("col_1", "col_2", "col_3")
Google.Adj.Close_sp <- ggplot(Data_df) +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=Adj.Close, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=Adj.Close, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_point(alpha=1, size=0.9, shape=19, aes(x=t, y=Adj.Close, color="col_1")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks,
                      guide=guide_legend(override.aes=list(shape=c(19,NA,NA), 
                                                           linetype=c("blank", "dashed", "solid")))) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=-45, vjust=1),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(Google.Adj.Close_sp)

We save the plot in a file.

file_name = paste("Alphabet Inc. Adjusted Close SP - from ", First_Date, " to ", Last_Date, " - SP.png", sep="")
png(file_name, width=1600, height=800, res=120)
print(Google.Adj.Close_sp)
dev.off()
## png 
##   2
file_name = paste("Alphabet Inc. Adjusted Close SP - from ", First_Date, " to ", Last_Date, " - SP.pdf", sep="")
pdf(file_name, width=12, height=7)
print(Google.Adj.Close_sp)
dev.off()
## png 
##   2

The line plot.

title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Line Plot of the Alphabet Inc. Adjusted Close from ", .(First_Date), " to ", .(Last_Date), sep="")))
Google.Adj.Close_lp <- ggplot(Data_df) +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=Adj.Close, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=Adj.Close, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_line(alpha=1, size=0.8, linetype="solid", aes(x=t, y=Adj.Close, color="col_1", group=1)) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks,
                      guide=guide_legend(override.aes=list(linetype=c("solid", "dashed", "solid")))) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=-45, vjust=1),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(Google.Adj.Close_lp)

We also consider the Gold Fixing Price 3:00 P.M. (London time) in London Bullion Market, based in U.S. Dollars in London Bullion Market. We download the data from St. Louis FRED the US-Daily-Gold-PM-Fix-Prices-From-2020-04-02-2021-04-01.csv file, to be read as a data.frame object in R.

Gold_df <- read.csv("C:/Users/rober/My Documents - Notebook (local)/My Classes/MPSMF/R - Scripts & Data/Data/US-Daily-Gold-PM-Fix-Prices-From-2020-04-02-2021-04-01.csv", header=TRUE)
class(Gold_df)
## [1] "data.frame"
head(Gold_df)
##         Date USD_PM_Fix
## 1 02/04/2020     1616.8
## 2 03/04/2020     1613.1
## 3 06/04/2020     1648.3
## 4 07/04/2020     1649.3
## 5 08/04/2020     1647.8
## 6 09/04/2020     1680.7
tail(Gold_df)
##           Date USD_PM_Fix
## 256 25/03/2021    1737.30
## 257 26/03/2021    1731.80
## 258 29/03/2021    1705.95
## 259 30/03/2021    1683.95
## 260 31/03/2021    1691.05
## 261 01/04/2021    1726.05
# We check whether the Date column is in "Date" format. In case it is not, we change the format to "Date".
class(Gold_df$Date)
## [1] "character"
# library(lubridate)
Gold_df$Date <- as.Date(Gold_df$Date, format="%d/%m/%Y")
class(Gold_df$Date)
## [1] "Date"
head(Gold_df)
##         Date USD_PM_Fix
## 1 2020-04-02     1616.8
## 2 2020-04-03     1613.1
## 3 2020-04-06     1648.3
## 4 2020-04-07     1649.3
## 5 2020-04-08     1647.8
## 6 2020-04-09     1680.7
# We add and index column
Gold_df <- add_column(Gold_df, t=1:nrow(Gold_df), .before="Date")
show(Gold_df[1:15,])
##     t       Date USD_PM_Fix
## 1   1 2020-04-02     1616.8
## 2   2 2020-04-03     1613.1
## 3   3 2020-04-06     1648.3
## 4   4 2020-04-07     1649.3
## 5   5 2020-04-08     1647.8
## 6   6 2020-04-09     1680.7
## 7   7 2020-04-10     1680.7
## 8   8 2020-04-13     1680.7
## 9   9 2020-04-14     1741.9
## 10 10 2020-04-15     1718.7
## 11 11 2020-04-16     1729.5
## 12 12 2020-04-17     1692.6
## 13 13 2020-04-20     1686.2
## 14 14 2020-04-21     1682.1
## 15 15 2020-04-22     1710.6
# As a help to determine a good number of tickers on the x axis
# library("numbers")
primeFactors(nrow(Gold_df))
## [1]  3  3 29

We draw a plot of the PM fix prices.

Data_df <- Gold_df
length <- length(na.omit(Data_df$USD_PM_Fix))
First_Date <- as.character(Data_df$Date[1])
Last_Date <- as.character(last(Data_df$Date))
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Scatter Plot of Gold LBMA - PM Fix Price from ", .(First_Date), " to ", .(Last_Date))))
link_1 <- "https://fred.stlouisfed.org/series/GOLDPMGBD228NLBM"
subtitle_content <- bquote(paste("sample path length ", .(length), " sample points. Data by courtesy of FRED - ", .(link_1)))
caption_content <- "Author: Roberto Monte"
x_name <- bquote("")
x_breaks_num <- 9
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((x_breaks_up-max(x_breaks))>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- as.character(Data_df$Date[x_breaks])
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth,x_breaks_up+J*x_binwidth)
# x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=50)
# if((max(x_breaks)-x_breaks_up)>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
y_name <- bquote("gold PM-fix price")
y_breaks_num <- 10
y_bound_low <- min(na.omit(Data_df$USD_PM_Fix))
y_bound_up <- max(na.omit(Data_df$USD_PM_Fix))
y_binwidth <- round((y_bound_up-y_bound_low)/y_breaks_num, digits=3)
y_breaks_low <- floor(y_bound_low/y_binwidth)*y_binwidth
y_breaks_up <- ceiling(y_bound_up/y_binwidth)*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("Gold SPot Price")
col_2 <- bquote("LOESS curve")
col_3 <- bquote("regression line")
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="blue", "col_2"="red", "col_3"="green3")
leg_breaks <- c("col_1", "col_2", "col_3")
Gold_PM_Fix_Price_sp <- ggplot(Data_df) +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=USD_PM_Fix, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=USD_PM_Fix, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_point(alpha=1, size=0.9, shape=19, aes(x=t, y=USD_PM_Fix, color="col_1")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks,
                      guide=guide_legend(override.aes=list(shape=c(19,NA,NA), 
                                                           linetype=c("blank", "dashed", "solid")))) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=0, vjust=1),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(Gold_PM_Fix_Price_sp)

The line plot.

title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Line Plot of Gold LBMA - PM Fix Price from ", .(First_Date), " to ", .(Last_Date))))
Gold_PM_Fix_Price_lp <- ggplot(Data_df) +
   geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=USD_PM_Fix, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=USD_PM_Fix, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_line(alpha=1, size=0.8, linetype="solid", aes(x=t, y=USD_PM_Fix, color="col_1", group=1)) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks = y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=0, vjust=1),
        legend.key.width = unit(0.80,"cm"), legend.position="bottom")
plot(Gold_PM_Fix_Price_lp)

As a further example of real time series, we consider the time series of the temperature anomalies with respect to the 20th century average. Thanks to the National Centers for Environmental Information (NOAA) (see https://www.ncdc.noaa.gov/cag/global/time-series/ see also Berkley Heart Organization http://berkeleyearth.org/ and http://berkeleyearth.org/global-temperature-report-for-2020/). We download the data.csv data file with default parameters and after renaming it as Temp_An.csv, we read it as data frame object in R.

# Reading data in a data.frame object from a .csv file
Temp_An_df <- read.csv("C:/Users/rober/My Documents - Notebook (local)/My Classes/MPSMF/R - Scripts & Data/Data/Temp_An.csv", header=TRUE)
head(Temp_An_df)
##   Year Value
## 1 1880 -0.10
## 2 1881  0.02
## 3 1882 -0.14
## 4 1883 -0.16
## 5 1884 -0.22
## 6 1885 -0.31
tail(Temp_An_df)
##     Year Value
## 137 2016  0.94
## 138 2017  0.87
## 139 2018  0.81
## 140 2019  0.86
## 141 2020  0.94
## 142 2021  0.81
str(Temp_An_df)
## 'data.frame':    142 obs. of  2 variables:
##  $ Year : int  1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 ...
##  $ Value: num  -0.1 0.02 -0.14 -0.16 -0.22 -0.31 -0.19 -0.26 -0.16 -0.03 ...
nrow(Temp_An_df)
## [1] 142
Temp_An_df <- add_column(Temp_An_df, t=1:nrow(Temp_An_df), .before="Year")
head(Temp_An_df)
##   t Year Value
## 1 1 1880 -0.10
## 2 2 1881  0.02
## 3 3 1882 -0.14
## 4 4 1883 -0.16
## 5 5 1884 -0.22
## 6 6 1885 -0.31

Then we consider the scatter plot of the Value column against the Year column.

Data_df <- Temp_An_df
length <- nrow(Data_df)
First_Date <- paste(Data_df$Year[1])
Last_Date <- paste(Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Scatter Plot of May temperature Anomalies Against Years (1901-2000 Base Period - Units Degrees Celsius) from ", .(First_Date), " to ", .(Last_Date), sep="")))
link <-  "https://www.ncdc.noaa.gov/cag/global/time-series/"
subtitle_content <- bquote(paste("sample path length ", .(length), " sample points. By courtesy of NOAA  -  ", .(link)))
caption_content <- "Author: Roberto Monte"
x_name <- bquote("")
x_breaks_num <- 16
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((x_breaks_up-max(x_breaks))>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- as.character(Data_df$Year[x_breaks])
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth,x_breaks_up+J*x_binwidth)
y_name <- bquote("Temperature Anomalies")
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$Value)-min(Data_df$Value))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$Value)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$Value)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0.5
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("Temperature Anomalies")
col_2 <- bquote("LOESS curve")
col_3 <- bquote("regression line")
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="blue", "col_2"="red", "col_3"="green3")
leg_breaks <- c("col_1", "col_2", "col_3")
Temp_An_sp <- ggplot(Data_df) +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=Value, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=Value, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_point(alpha=1, size=1.5, shape=19, aes(x=t, y=Value, color="col_1")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks,
                      guide=guide_legend(override.aes=list(shape=c(19,NA,NA), 
                                                           linetype=c("blank", "dashed", "solid")))) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=0, vjust=1),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(Temp_An_sp)

The line plot.

title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Line Plot of May temperature Anomalies Against Years (1901-2000 Base Period - Units Degrees Celsius) From", .(First_Date), " to ", .(Last_Date), sep="")))
Temp_An_lp <- ggplot(Data_df) +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=Value, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=Value, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_line(alpha=1, size=0.8, linetype="solid", aes(x=t, y=Value, color="col_1", group=1)) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks,
                      guide=guide_legend(override.aes=list(linetype=c("solid", "dashed", "solid")))) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=0, vjust=1),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(Temp_An_lp)

We now present some websites offering the possibility of retrieving interesting examples of time series. It should be noted that the clickjacking prevention policy adopted by several of these sites may not allow them to be viewed as a web page embedded in the HTML file compiled by R markdown. We will try to circumvent, at least partially, this problem by opening the compiled HTML file with the Firefox browser.

We begin by considering the time series of the weather conditions in September 20221 monitored at the La Guardia Airport weather station in New York City, provided by the website Weather Underground (see https://www.wunderground.com/).

knitr::include_url("https://www.wunderground.com/history/monthly/us/ny/new-york-city/KLGA/date/2021-9")

Unfortunately, the Weather Underground website does not allow the requested page to be opened inside the HTML file generated by R-markdown. We can hope to be luckier in looking for a similar time series for the city of Rome, thanks to the Il Meteo website (<see https://www.ilmeteo.it/).

knitr::include_url("https://www.ilmeteo.it/portale/archivio-meteo/Roma")

Other websites providing with a plenty of historical weather and climate data are the National Oceanic and Atmospheric Administration (NOAA) website (see https://www.noaa.gov/);

knitr::include_url("https://www.noaa.gov/")

the Climate-Data.org website (see https://en.climate-data.org/);

knitr::include_url("https://en.climate-data.org/")

the Visual Crossing: Weather Data & Weather API (see https://www.visualcrossing.com/).

knitr::include_url("https://www.visualcrossing.com/")

Another interesting website is Keras (see https://keras.io/about/) hosting the notebook Timeseries forecasting for weather prediction, authored by Prabhanshu Attri, Yashika Sharma, Kristi Takach, and Falak Shah, which provides some insight on weather time series analysis.

knitr::include_url("https://keras.io/examples/timeseries/timeseries_weather_forecasting/")

The notebook is based on historical data available at the Max-Planck-Institut für Biogeochemie (see https://www.bgc-jena.mpg.de/wetter/).

knitr::include_url("https://www.bgc-jena.mpg.de/wetter/")

A rich source of time series describing various phenomena is the website Ourworldindata (see https://ourworldindata.org/).

knitr::include_url("https://ourworldindata.org/")

In particular, the website Ourworldindata contains a large amount of time series about the CoViD-19 pandemic.

knitr::include_url("https://ourworldindata.org/coronavirus#explore-the-global-situation")

Another rich source of time series about the CoViD-19 pandemic and, more generally, medical data is the World health Organization (WHO) website (see https://www.who.int/)

knitr::include_url("https://www.who.int/")

In the WHO we find the following dashboard on Coronavirus

 knitr::include_url("https://covid19.who.int/")

In the end, we mention the nice Italian dashboard (see https://datastudio.google.com/reporting/91350339-2c97-49b5-92b8-965996530f00/page/RdlHB) realized since the early days of the spread of the pandemic in our country

 knitr::include_url("https://datastudio.google.com/reporting/91350339-2c97-49b5-92b8-965996530f00/page/RdlHB")

One might wonder what the presented time series have in common, what are the differences among them, and whether is there any scientific way to predict their future course. The attempt to answer these questions motivates the foundation and the development of the Time Series Analysis.

2 Probability Spaces and Random Variables.

The Time Series Analysis stems from the key concepts of random variable and stochastic process.

Loosely speaking, a random variable is a function of the outcomes of a random phenomenon whose values are observable depending on the available information. Hence, a random variable is a mathematical model to represent a quantitative or categorical observation on the outcomes of a random phenomenon which can be made by an observer in light of the information available to her. A stochastic process is a family of temporally indexed random variables. Hence, a stochastic process aims to represent a temporal sequence of observations on the outcomes of a stochastic phenomenon, which can be made by an observer thanks to the information which becomes progressively available to her.

To make these ideas more rigorous we need to introduce a probability space which is a triple made by

  1. the set of all outcomes of a random phenomenon whose occurrence can be established unambiguously by the observer, the so called sample space;

  2. a representation of the information available to the observer by means of a family of subsets of the sample space, referred to as events, which is called \(\sigma\)-algebra of events, since satisfies some technical properties;

  3. a probability on the information available to the observer, which is a function to rank the events according the possibility of their occurrence.

Using the standard notation, we write \(\omega\) for a generic outcome of the random phenomenon and \(\Omega\) for the sample space of a random phenomenon. We denote by a capital letter, e.g. \(E,F,G,\dots\), various events of the sample space, that is various subsets of \(\Omega\), and we denote by a capital calligraphy letter, e.g. \(\mathcal{E},\mathcal{F},\mathcal{G},\dots\), various pieces of information on the random phenomenon, that is various families of subsets of \(\Omega\). We write \(\mathbf{P}:\mathcal{E}\rightarrow\mathbb{R}_{+}\) for the probability on the piece of information \(\mathcal{E}\) available to some observer. In the end, we call a probability space a triple \(\left(\Omega,\mathcal{E},\mathbf{P}\right)\) made by a sample space, a \(\sigma\)-algebra of events, and a probability. To shorten notation, a probability space is often denoted by the single symbol \(\Omega\).

Given the sample space \(\Omega\) of a random phenomenon, the power set \(\mathcal{P}\left(\Omega\right)\) of \(\Omega\), that is the family of all events of \(\Omega\), represents the family of all events whose occurrence might be in principle assessed by an observer of the phenomenon. In other words, \(\mathcal{P}\left(\Omega\right)\) represents the whole information on the random phenomenon. However, in many cases, such information might be too large to be mathematically handled in a non trivial way or the information which is actually available to an observer is only a piece of all information: part of the information could be hidden or, as it happens in stochastic phenomena, it could be revealed only progressively in time. Therefore, it is necessary to consider suitable subfamilies of \(\mathcal{P}\left(\Omega\right)\) which are closed under the logical manipulation of the contained events and allow the possibility of exploiting the mathematical idea of limit. These are the \(\sigma\)-algebra of events defined as follows.

Definition 2.1 (Sigma Algebra) Given a sample space \(\Omega\), let \(\mathcal{P}\left(\Omega\right)\) be the power set of \(\Omega\). We call a \(\sigma\)-algebra of events of \(\Omega\) any nonempty subfamily \(\mathcal{E}\) of \(\mathcal{P}\left(\Omega\right)\) which satisfies the following conditions

  1. \(E^{c}\in\mathcal{E}\), for every \(E\in\mathcal{E}\);

  2. \(\bigcup_{n=1}^{\infty}E_{n}\in\mathcal{E}\), for every sequence \(\left(E_{n}\right)_{n\geq1}\) in \(\mathcal{E}\).

In turn, the probability has to be defined in order to deal with countable unions of events and allow the introduction of the mathematical limit.

Definition 2.2 (Probability) Given a sample space \(\Omega\), let \(\mathcal{E}\) be a \(\sigma\)-algebra of events of \(\Omega\). We call a probability on \(\mathcal{E}\) a function \(\mathbf{P}:\mathcal{E}\rightarrow \mathbb{R}_{+}\) whuich satisfies the following conditions

  1. \(\mathbf{P}(\Omega)=1\);

  2. \(\mathbf{P}\left({\textstyle\bigcup_{n=1}^{\infty}}E_{n}\right) =\sum_{n=1}^{\infty}\mathbf{P}\left(E_{n}\right)\), for every sequence \(\left(E_{n}\right)_{n\geq1}\) of mutually exclusive events in \(\mathcal{E}\).

Definition 2.3 (Probability) Given a sample space \(\Omega\), let \(\mathcal{E}\) be a \(\sigma\)-algebra of events of \(\Omega\). If \(\mathbf{P}:\mathcal{E}\rightarrow \mathbb{R}_{+}\) is a probability on \(\mathcal{E}\), then for every \(E\in\mathcal{E}\) the positive real number \(\mathbf{P}\left(E\right)\) is called the probability of the event \(E\). It is also customary to call \(\mathbf{P}:\mathcal{E}\rightarrow\mathbb{R}_{+}\) a probability on \(\Omega\), when no confusion can arise about the \(\sigma\)-algebra \(\mathcal{E}\).

As a mere consequence of Definitions 2.1 and 2.2 a probability \(\mathbf{P}:\mathcal{E}\rightarrow\mathbb{R}_{+}\) a probability on a \(\sigma\)-algebra \(\mathcal{E}\) of events of \(\Omega\) satisfies, among others, the following basic properties.

Proposition 2.1 (Properties of a probability) We have

  1. \(\mathbf{P}\left(\mathbb{\varnothing}\right)=0\);

  2. \(\mathbf{P}\left(E^{c}\right)=1-\mathbf{P}\left(E\right)\), for any \(E\in\mathcal{E}\);

  3. \(\mathbf{P}\left(F-E\right)=\mathbf{P}\left(F\right)-\mathbf{P}\left(E\cap F\right)\), for all \(E,F\in\mathcal{E}\); in particular, \(\mathbf{P}\left(F-E\right)=\mathbf{P}\left(F\right)-\mathbf{P}\left(E\right)\) when \(E\subseteq F\);

  4. \(\mathbf{P}\left(E\right)\leq\mathbf{P}\left(F\right)\), for all \(E,F\in\mathcal{E}\) such that \(E\subseteq F\);

  5. \(\mathbf{P}\left(E\right)\leq1\), for any \(E\in\mathcal{E}\);

  6. \(\mathbf{P}\left(E\cup F\right)=\mathbf{P}\left(E\right)+\mathbf{P}\left(F\right)-\mathbf{P}\left(E\cap F\right)\), for all \(E,F\in\mathcal{E}\);

  7. \(\mathbf{P}\left({\textstyle\bigcup_{k=1}^{n}}E_{k}\right)=\sum_{k=1}^{n}\mathbf{P}\left(E_{k}\right)\), for any finite sequence \(\left(E_{k}\right)_{k=1}^{n}\) in \(\mathcal{E}\) such that \(E_{k_{1}}\cap E_{k_{2}}=\varnothing\) whenever \(k_{1}\neq k_{2}\);

  8. \(\mathbf{P}\left({\textstyle\bigcup_{n=1}^{\infty}}E_{n}\right)\leq\sum_{n=1}^{\infty}\mathbf{P}\left(E_{n}\right)\), for any sequence \(\left(E_{n}\right)_{n\geq1}\) in \(\mathcal{E}\).

Moreover, we have

Proposition 2.2 (Properties of a probability) Let \(\left(E_{n}\right)_{n\geq1}\) be a sequence of events in \(\mathcal{E}\). Assume that \(\left(E_{n}\right)_{n\geq1}\) is increasing [risp. decreasing], that is \[\begin{equation} E_{n}\subseteq E_{n+1}\qquad\text{[resp. }E_{n}\supseteq E_{n+1}\text{]}, \end{equation}\] for every \(n\in\mathbb{N}\). Then \[\begin{equation} \mathbf{P}\left({\textstyle\bigcup_{n=1}^{\infty}}E_{n}\right) =\lim_{n\rightarrow\infty}\mathbf{P}(E_{n}) \qquad \text{[resp. }\mathbf{P}\left({\textstyle\bigcap_{n=1}^{\infty}} E_{n}\right)=\lim_{n\rightarrow\infty}\mathbf{P}(E_{n})\text{].} \end{equation}\]

Summarizing, all outcomes of a random phenomenon, constitute the sample space of the random phenomenon and are also referred to as sample points. Sets of outcomes, that is subsets of the sample space, are called events. Families of events satisfying some technical properties, called \(\sigma\)-algebra of events, represent various pieces of information which may be available to different observers or to the same observer under different circumstances or at different times. A probability is a function allowing to rank the events of a piece of information according the possibility of their occurrence.

Remarkable events of a sample space are: the sure event, that is the set of all possible outcomes of the random phenomenon, which corresponds to the sample space \(\Omega\) itself; the impossible event, that is the set of all impossible outcomes of the random phenomenon, to say the outcomes characterized by contradictory properties, which corresponds to the empty subset \(\mathbb{\varnothing}\) of \(\Omega\); the elementary events, which are the events made up by a single outcome \(\omega\in\Omega\) and correspond to the sets \(\left\{\omega\right\}\subseteq\Omega\), on varying of \(\omega\in\Omega\);

Remarkable \(\sigma\)-algebra of events are the trivial \(\sigma\)-algebra of events, which is the family \(\left\{\Omega,\mathbb{\varnothing}\right\}\), corresponding to the smallest information which may be available on a random phenomenon, and the discrete \(\sigma\)-algebra, that is the family of all events of a sample space \(\Omega\), also referred to as the power set of \(\Omega\) and denoted by \(\mathcal{P}\left(\Omega\right)\). The latter represents the largest information we may have on a random phenomenon.

Example 2.1 (Coin flip) We flip a coin (not necessarily fair). The only possible outcomes of the flip are the front or back face of the coin, referred to as heads and tails, respectively. As it is customary, let us write \(1\) [resp. \(0\)] for heads [resp. tails]. Thus, the sample space \(\Omega\) can be represented by the set \(\left\{1,0\right\}\). We can think of the sure [resp. impossible] event \(\Omega\) [resp. \(\mathbb{\varnothing}\)] as the event “the outcome of the flip was heads or tails” [resp. “the outcome of the flip was heads and tails”]. The elementary events are \(\left\{1\right\}\equiv H\) and \(\left\{0\right\}\equiv T\), which correspond to the events “the outcome of the flip was heads” and “the outcome of the flip was tails”, respectively. The family of all events of \(\Omega\) is given by \(\mathcal{P}\left(\Omega\right)=\left\{\Omega,\mathbb{\varnothing},H,T\right\}\). Note that \(\mathcal{P}\left(\Omega\right)\) contains \(4=2^{2}\) elements. To introduce a probability on the events in \(\mathcal{P}\left(\Omega\right)\), we consider a sequence of two real numbers \(\left(p,q\right)\) such that \[\begin{equation} p,q\geq 0,\quad \forall k=1,2\quad \text{and}\quad p+q=1. \tag{2.1} \end{equation}\] Such a sequence is called a probability distribution on the sample space of a coin flip. Hence, we define a probability \(\mathbf{P}:P\left(\Omega \right)\rightarrow \mathbb{R}_{+}\) by the rule \[\begin{equation} \mathbf{P}\left(E\right)\overset{\text{def}}{=}\left\{ \begin{array}{ll} 0, & \text{if }E=\varnothing,\\ p, & \text{if }1\in E\text{ and }0\notin E,\\ q, & \text{if }1\notin E\text{ and }0\in E,\\ 1 & \text{if }E=\Omega . \end{array} \right. \tag{2.2} \end{equation}\] Note that choosing \(p=q=1/2\) Equation (2.1) defines the probability distribution of a fair coin flip.

Example 2.2 (Die roll) We roll a die. Each of the six faces of the die is marked with spots: from one to six spots. Clearly, the possible outcomes of the roll can be represented by the numbers \(1,\dots,6\), of the spots marking the faces. Thus, the sample space \(\Omega\) can be represented by the set \(\left\{1,\dots,6\right\}\), that is \(\Omega\equiv\left\{1,\dots,6\right\}\). We can think of the sure event \(\Omega\) as the event “the outcome of the roll was \(1\) or \(2\) oror \(6\)”. We can think of the impossible event \(\mathbb{\varnothing}\) as the event “the outcome of the roll was \(1\) and \(2\)”, but also “the outcome of the roll was \(2\) and \(3\)” and so on. The elementary event “the outcome of the roll is \(k\)” is represented by the subset \(\left\{k\right\}\), on varying of \(k=1,\dots,6\). The event “the outcome of the roll is odd [resp. even]” is represented by the subset \(E_{\mathbb{O}}\equiv\left\{1,3,5\right\}\) [resp. \(E_{\mathbb{E}}\equiv\left\{2,4,6\right\}\)] of \(\Omega\). In roster form, the family of all events \(\mathcal{P}\left(\Omega\right)\) is given by \[\begin{equation} \left\{\Omega, \mathbb{\varnothing}, \left\{1\right\}, \dots,\left\{6\right\} , \left\{1,2\right\},\dots,\left\{5,6\right\}, \left\{1,2,3\right\},\dots,\left\{4,5,6\right\} ,\dots, \left\{1,2,3,4,5\right\},\dots,\left\{2,3,4,5,6\right\}\right\}. \end{equation}\] By combinatorics, it is not difficult to prove that \(\mathcal{P}\left(\Omega\right)\) contains \(2^{6}=64\) distinct events. To introduce a probability on the events in \(\mathcal{P}\left(\Omega\right)\), we consider a sequence of six real numbers \(\left(p_{k}\right)_{k=1}^{6}\) such that \[\begin{equation} p_{k}\geq 0,\quad\forall k=1,\dots,6 \quad\text{and}\quad\sum_{k=1}^{6}p_{k}=1. \tag{2.3} \end{equation}\] Such a sequence is called a probability distribution on the sample space of a die roll. Hence, we define a probability \(\mathbf{P}:P\left(\Omega \right)\rightarrow\mathbb{R}_{+}\) by the rule \[\begin{equation} \mathbf{P}\left(E\right)\overset{\text{def}}{=}\sum_{\{k\in E\}}p_{k}. \tag{2.4} \end{equation}\] Note that, according to Equation (2.4), we obtain \[\begin{equation} \mathbf{P}\left(\varnothing \right)=0, \quad \mathbf{P}\left(\Omega\right)=1, \quad \mathbf{P}\left(k\right)=p_{k},\quad \forall k=1,\dots,6, \end{equation}\] where \(\mathbf{P}\left(k\right)\) is the standard abbreviation for \(\mathbf{P}\left(\left\{k\right\}\right)\), for \(k=1,\dots,6\). Note also that \[\begin{equation} \mathbf{P}\left(E_{\mathbb{O}}\right)=p_{1}+p_{3}+p_{5} \quad\text{and}\quad \mathbf{P}\left(E_{\mathbb{E}}\right)=p_{2}+p_{4}+p_{6}. \end{equation}\] In the end, note that choosing \(p_{1}=\cdots=p_{6}=1/6\) Equation (2.3) defines the probability distribution of the roll of a fair die. Another family of events is given by the family \[\begin{equation} \mathcal{E}\equiv \left\{\Omega, \mathbb{\varnothing}, E_{\mathbb{O}}, E_{\mathbb{E}}\right\} \end{equation}\] Note that rolling a die and having available only the information represented by \(\mathcal{E}\) is equivalent to flipping a coin. In this case to assign a probability distribution on the events in \(\mathcal{E}\) we just need to resort to a probability distribution of the coin flip and assign a probability by the rule \[\begin{equation} \mathbf{P}\left(E\right)\overset{\text{def}}{=}\left\{ \begin{array}{ll} 0, & \text{if }E=\varnothing, \\ p, & \text{if }E=E_{\mathbb{O}}, \\ q, & \text{if }E=E_{\mathbb{E}}, \\ 1 & \text{if }E=\Omega . \end{array} \right. \tag{2.5} \end{equation}\] Note that to obtain a probability on the events in \(\mathcal{E}\) which is coherent with the probability on the events in \(\mathcal{P}\left(\Omega\right)\) is necessary and sufficient to choose \[\begin{equation} p=p_{1}+p_{3}+p_{5}\quad\text{and}\quad q=p_{2}+p_{4}+p_{6}. \end{equation}\]

Example 2.3 (Three flips of a coin) We flip a coin three times indexed by the numbers \(1,2,3\). Referring to Example 2.1, the only possible outcomes of each flip are the numbers \(1\) or \(0\) that we agreed to use to represent the outcome “heads” and “tails”, respectively. Thus, the sample space \(\Omega\) can be represented by the set of all ordered triples \(\left(\omega_{1},\omega_{2},\omega_{3}\right)\), where \(\omega_{k}=1\) or \(\omega_{k}=0\), for every \(k=1,2,3\). In roster form, \[\begin{equation} \Omega\equiv\left\{\left(1,1,1\right),\left(1,1,0\right),\left(1,0,1\right), \left(1,0,0\right),\left(0,1,1\right),\left(0,1,0\right),\left(0,0,1\right),\left(0,0,0\right)\right\}. \end{equation}\] Note that \(\Omega\) contains \(2^{3}=8\) elements. Therefore, the discrete \(\sigma\)-algebra \(\mathcal{P}\left(\Omega\right)\) contains \(8^{2}=64\) elements. Besides the sure event \(\Omega\), the impossible event \(\mathbb{\varnothing}\) and the elementary events \(E_{\omega_{1},\omega_{2},\omega_{3}}\equiv\left\{\left(\omega_{1},\omega_{2},\omega_{3}\right)\right\}\), on varying of \(\omega_{k}=1,0\), for \(k=1,2,3\), the family \(\mathcal{P}\left(\Omega\right)\equiv\mathcal{E}_{3}\) contains also events of the type \[\begin{equation} E_{1,\omega_{2},\omega_{3}}\equiv\left\{\left(1,1,1\right),\left(1,1,0\right), \left(1,0,1\right),\left(1,0,0\right)\right\}, \end{equation}\] which can be read “the outcome of the first flip of the coin was heads”, or \[\begin{equation} E_{0,\omega_{2},\omega_{3}}\equiv\left\{\left(0,1,1\right),\left(0,1,0\right), \left(0,0,1\right),\left(0,0,0\right)\right\}=E_{1,\omega_{2},\omega_{3}}^{c}, \end{equation}\] which can be read “the outcome of the first flip of the coin was tails”, or \[\begin{equation} E_{1,\omega_{2},0}\equiv\left\{\left(1,1,0\right),\left(1,0,0\right)\right\} =E_{1,\omega_{2},\omega_{3}}\cap E_{\omega_{1},\omega_{2},0}, \end{equation}\] which can be read “the outcome of the first flip of the coin was heads and the outcome of third flip was tails”, and so on. Other families of events are the trivial \(\sigma\)-algebra \(\left\{\Omega,\mathbb{\varnothing}\right\}\equiv\mathcal{E}_{0}\), the family \(\mathcal{E}_{1}\) given by \[\begin{equation} \mathcal{E}_{1}\equiv \left\{\Omega,\mathbb{\varnothing},E_{1},E_{0}\right\} \end{equation}\] where \(E_{1}\) [resp. \(E_{0}\)] is an abbreviation for \(E_{1,\omega_{2},\omega_{3}}\) [resp. \(E_{1,\omega_{2},\omega_{3}}\)] and \[\begin{equation} \mathcal{E}_{2}\equiv \left\{\Omega,\mathbb{\varnothing}, E_{1}, E_{0}, E_{1,1}, E_{1,0}, E_{0,1}, E_{0,0}, E^{c}_{1,1}, E^{c}_{1,0}, E^{c}_{0,1}, E^{c}_{0,0}, E_{1,1}\cup E_{0,1}, E_{1,1}\cup E_{0,0}, E_{1,0}\cup E_{0,1}, E_{1,0}\cup E_{0,0}\right\}, \end{equation}\] where \(E_{1,1}\equiv\left\{\left(1,1,1\right),\left(1,1,0\right)\right\}\), \(E_{1,0}\equiv\left\{\left(1,0,1\right),\left(1,0,0\right)\right\}\), \(E_{0,1}\equiv\left\{\left(0,1,1\right),\left(0,1,0\right)\right\}\), and \(E_{0,0}\equiv\left\{\left(0,0,1\right),\left(0,0,0\right)\right\}\). Note that we have \[\begin{equation} \mathcal{E}_{0}\subseteq\mathcal{E}_{1}\subseteq\mathcal{E}_{2}\subseteq\mathcal{E}_{3} \end{equation}\] Eventually, the \(\sigma\)-algebra \(\mathcal{E}_{0}\) represents the information which is available to an observer before the coin is flipped and the \(\sigma\)-algebra \(\mathcal{E}_{t}\) represents the available information on the \(t\)th flip, for \(t=1,2,3\), assuming the observer does not forget the past flips.

We can now introduce the definition of real random variable.

Let \(\left(\Omega,\mathcal{E},\mathbf{P}\right)\equiv\Omega\) be a probability space.

Definition 2.4 (Random Variable) We say that a function \(X:\Omega\rightarrow\mathbb{R}^{N}\), briefly denoted also by \(X\), is an \(N\)-variate real \(\mathcal{E}\)-random variable or an \(N\)-dimensional real \(\mathcal{E}\)-random vector if the \(X\)-inverse image of any closed interval of \(\mathbb{R}^{N}\) is an event in \(\mathcal{E}\). In symbols \[\begin{equation} \left\{a\leq X\leq b\right\}\in\mathcal{E}, \quad\forall\,\left[a,b\right]\equiv\mathsf{X}_{n=1}^{N}\left[a_{n},b_{n}\right], \tag{2.6} \end{equation}\] where \(\left\{a\leq X\leq b\right\}\) is the standard abbreviation for \(\left\{\omega\in\Omega:X\left(\omega\right)\in\left[a,b\right]\right\}\). If \(X\) is an \(N\)-variate real \(\mathcal{E}\)-random variable, the value \(X\left(\omega\right)\in\mathbb{R}^{N}\) taken by \(X\) on the generic sample point \(\omega\in\Omega\) is also referred to as a realization of \(X\).

Equation (2.6) is just the way to give a mathematical model to the circumstance that to call a function a random variable we need to be able to observe its realizations in light of the available information, represented in turn by the \(\sigma\)-algebra \(\mathcal{E}\). We try to stress further this idea by a simple example.

Example 2.4 (Non random variable) Let \(\Omega\equiv\left\{1,\dots, 6\right\}\) be the sample space of all possible outcomes of the roll of a die (see Example 2.2) and let \(X:\Omega\rightarrow\mathbb{R}\) the function given by \[\begin{equation} X\left(k\right)\overset{\text{def}}{=} \left\{ \begin{array}{ll} k, & \text{if }k\text{ is odd}, \\ 1-k, & \text{if }k\text{ is even}, \end{array} \right. \quad\forall k=1,\dots,6. \end{equation}\] If \(\mathcal{E}\equiv\mathcal{P}\left(\Omega\right)\) is the complete information on \(\Omega\), then \(X\) is clearly an \(\mathcal{E}\)-random variable. On the other hand, choosing \[\begin{equation} \mathcal{E}\equiv\left\{\mathbb{\varnothing}, \Omega, E_{\mathbb{O}}, E_{\mathbb{E}}\right\}, \end{equation}\] then \(X\) is not a \(\mathcal{E}\)-random variable. To show this, it is sufficient to observe that, under the above choice, we have \[\begin{equation} \left\{\tfrac{1}{2}\leq X\leq\tfrac{3}{2}\right\}=\left\{1\right\}\notin\mathcal{E}. \end{equation}\] Hence, Equation (2.6) is not satisfied at least for the choice of the interval \(\left[1/2,3/2\right]\). Now, assume we want to build a function \(\hat{X}:\Omega\rightarrow\mathbb{R}\) which takes observable values in light of the information \(\mathcal{E}\) and approximates \(X\) at the best according to the ordinary least square (OLS) method. This is referred to as the real \(\mathcal{E}\)-random variable OLS estimator of \(X\). First, since the values taken by \(\hat{X}\) have to be observable in light of the information provided by \(\mathcal{E}\), \(\hat{X}\) has to take the same value on the outcomes of \(\Omega\) which are favorable to the events \(E_{\mathbb{O}}\) and \(E_{\mathbb{E}}\). Otherwise, Equation (2.6) would not be again satisfied, for some interval \(\left[a, b\right]\) in \(\mathbb{R}\). Therefore, we must have \[\begin{equation} \hat{X}(1)=\hat{X}(3)=\hat{X}(5)\equiv\hat{x}_{1} \quad\text{and}\quad \hat{X}(2)=\hat{X}(4)=\hat{X}(6)\equiv\hat{x}_{2} \end{equation}\] Equivalently, \[\begin{equation} \hat{X}=\hat{x}_{1}\cdot 1_{E_{\mathbb{O}}} + \hat{x}_{2}\cdot 1_{E_{\mathbb{E}}}. \end{equation}\] for some \(\hat{x}_{1},\hat{x}_{2}\in\mathbb{R}\), where \(1_{E_{\mathbb{O}}}\) [resp. \(1_{E_{\mathbb{E}}}\)] is the indicator function of the event \(E_{\mathbb{O}}\) [resp. \(E_{\mathbb{E}}\)] Second, applying the least square approximation method, which implicitly assumes we deal with a fair die, we compute \[\begin{eqnarray*} \sum_{k=1}^{6}\left(X\left(k\right)-\hat{X}\left(k\right)\right)^{2} &=&\sum_{k=1,3,5}\left(X\left(k\right)-\hat{X}\left(k\right) \right)^{2} +\sum_{k=2,4,6}\left(X\left(k\right)-\hat{X}\left(k\right)\right)^{2} \\ &=&\sum_{k=1,3,5}\left(k-\hat{x}_{1}\right)^{2}+\sum_{k=2,4,6}\left(1-k-\hat{x}_{2}\right)^{2} \\ &=&\left(1-\hat{x}_{1}\right)^{2}+\left(3-\hat{x}_{1}\right)^{2}+\left(5-\hat{x}_{1}\right)^{2} +\left(1+\hat{x}_{2}\right)^{2}+\left(3+\hat{x}_{2}\right)^{2}+\left(5+\hat{x}_{2}\right)^{2}. \end{eqnarray*}\] and, to determine the values of \(\hat{x}_{1}\) and \(\hat{x}_{2}\) which minimize the above sum of squares, we apply the first order conditions. Differentiating with respect to \(\hat{x}_{1}\) and \(\hat{x}_{2}\), we obtain \[\begin{equation} -2\left(1-\hat{x}_{1}\right)-2\left(3-\hat{x}_{1}\right)-2\left(5-\hat{x}_{1}\right)=0 \quad\text{and}\quad 2\left(1+\hat{x}_{2}\right) +2\left(3+\hat{x}_{2}\right) +2\left(5+\hat{x}_{2}\right)=0. \end{equation}\] These yield \[\begin{equation} \hat{x}_{1}=3,\qquad \hat{x}_{2}=-3. \end{equation}\] It follows \[\begin{equation} \hat{X}=3\cdot 1_{E_{\mathbb{O}}}-3\cdot 1_{E_{\mathbb{E}}}. \end{equation}\] Another method, which relies on the Radon Nikodym representation theorem and does not require the implicit assumption of fair die, prescribes to equate the mean values of the random variables \(X\) and \(\hat{X}\) over the events in the \(\sigma\)-algebra \(\mathcal{E}\). In the case considered, we obtain the equations \[\begin{equation} \hat{x}_{1}p=X\left(1\right)p_{1}+X\left(3\right)p_{3}+X\left(5\right)p_{5} \quad\text{and}\quad \hat{x}_{2}q=X\left(2\right)p_{2}+X\left(4\right)p_{4}+X\left(6\right)p_{6} \end{equation}\] from which it follows \[\begin{equation} \hat{x}_{1}\!=\!\frac{X\left(1\right)p_{1}+X\left(3\right)p_{3}+X\left(5\right)p_{5}}{p} \!=\!\frac{p_{1}+3p_{3}+5p_{5}}{p_{1}+p_{3}+p_{5}} \!\!\quad \text{and}\quad\!\! \hat{x}_{2}\!=\!\frac{X\left(2\right)p_{2}+X\left(4\right)p_{4}+X\left(6\right)p_{6}}{q} \!=\!-\frac{p_{2}+3p_{4}+5p_{6}}{p_{2}+p_{4}+p_{6}}. \end{equation}\] Note that under the assunption on fair die, that is \(p_{1}=\cdots =p_{6}=1/6\), we obtain again \[\begin{equation} \hat{x}_{1}=3\quad \text{and}\quad \hat{x}_{2}=-3. \end{equation}\] This second type of estimator of the random variable \(X\) to account of a reduced information \(\mathcal{E}\) is known as the conditional expectation of \(X\) given \(\mathcal{E}\) and is usually denoted by \(\mathbf{E}\left[X\mid\mathcal{E}\right]\).

It is possible to prove that, under rather general assumptions, the conditional expectation of a random variable is the best estimator of the random variable which accounts of a reduced information. Under stronger assumptions, the conditional expectation coincides with the OLS estimator.

Due to the importance of the conditional expectation in time series forecasting, we will present more details about in Section 2.1

2.1 Conditional Expectation

Let \(\left(\Omega,\mathcal{E},\mathbf{P}\right)\equiv\Omega\), be a probability space and let \(\mathcal{F}\) be a \(\sigma\)-algebra modeling a reduction in information on the random phenomenon represented by \(\mathcal{E}\), that is \(\mathcal{F}\subseteq\mathcal{E}\). Consider the restriction of the probability \(\mathbf{P}:\mathcal{E}\rightarrow\mathbb{R}_{+}\) to \(\mathcal{F}\), which is the probability \(\mathbf{P}_{\mid\mathcal{F}}:\mathcal{F}\rightarrow\mathbb{R}_{+}\) trivially given by \[\begin{equation} \mathbf{P}_{\mid\mathcal{F}}\left(F\right)\overset{\text{def}}{=}\mathbf{P}\left(F\right),\quad\forall F\in\mathcal{F}, \end{equation}\] and consider the probability space \(\left(\Omega,\mathcal{F},\mathbf{P}_{\mid\mathcal{F}}\right)\equiv\Omega_{\mathcal{F}}\). Fixed any \(M\in\mathbb{N}\), we introduce the Hilbert space \(L^{2}\left(\Omega;\mathbb{R}^{M}\right)\) [resp. \(L^{2}\left(\Omega_{\mathcal{F}};\mathbb{R}^{M}\right)\) of the \(M\)-dimensional \(\mathcal{E}\) [resp. \(\mathcal{F}\)]-real random vectors having finite moment of order \(2\). Recall that by an \(M\)-dimensional \(\mathcal{E}\) [resp. \(\mathcal{F}\)]-real random vector we mean a map from the sample space \(\Omega\) to the real Euclidean space \(\mathbb{R}^{M}\) which characterizes only observable events in light of the information represented by \(\mathcal{E}\) [resp. \(\mathcal{F}\)].

Definition 2.5 (Conditional expectation) We call the conditional expectation of the random vector \(X\in L^{2}\left(\Omega;\mathbb{R}^{M}\right)\), given the \(\sigma\)-algebra \(\mathcal{F}\), the random vector \(\mathbf{E}\left[X\mid\mathcal{F}\right]\in L^{2}\left(\Omega_{\mathcal{F}};\mathbb{R}^{M}\right)\) which solves the minimization problem \[\begin{equation} \mathbf{E}[X\mid\mathcal{F}\mathbf{]}\overset{\text{def}}{=} \arg\min\left\{\mathbf{E}\left[\left(Y-X\right)\left(Y-X\right)^{\intercal}\right]: Y\in L^{2}\left(\Omega_{\mathcal{F}};\mathbb{R}^{M}\right)\right\}. \tag{2.7} \end{equation}\] In particular, when \(M=1\) the conditional expectation of the random variable \(X\in L^{2}\left(\Omega;\mathbb{R}\right)\), given the \(\sigma\)-algebra \(\mathcal{F}\) is the random variable \(\mathbf{E}\left[X\mid\mathcal{F}\right]\in L^{2}\left(\Omega_{\mathcal{F}};\mathbb{R}\right)\) which solves the minimization problem \[\begin{equation} \mathbf{E}[X\mid\mathcal{F}\mathbf{]}\overset{\text{def}}{=} \arg\min\left\{\mathbf{E}\left[\left(Y-X\right)^{2}\right]: Y\in L^{2}\left(\Omega_{\mathcal{F}};\mathbb{R}\right)\right\}. \tag{2.8} \end{equation}\]

The idea in Definition 2.5 is to build \(\mathbf{E}\left[X\mid\mathcal{F}\right]\) as the estimator of \(X\), depending on the information represented by \(\mathcal{F}\), which minimizes the mean square error with respect to \(X\). That is the best approximation of the \(\mathcal{E}\)-random vector \(X\) as an \(\mathcal{F}\)-random vector in terms of the mean square distance from \(X\). Since the space \(L^{2}\left(\Omega;\mathbb{R}^{M}\right)\) is endowed with the mean square distance and the space \(L^{2}\left(\Omega_{\mathcal{F}};\mathbb{R}^{M}\right)\) is a subspace of \(L^{2}\left(\Omega;\mathbb{R}^{M}\right)\), the conditional expectation \(\mathbf{E}\left[X\mid\mathcal{F}\right]\) turns out to be the orthogonal projection of \(X\) on \(L^{2}\left(\Omega_{\mathcal{F}};\mathbb{R}^{M}\right)\). From the properties of Hilbert spaces, it then follows that \(\mathbf{E}\left[X\mid\mathcal{F}\right]\) always exists and is (almost surely) unique. On the other hand, finding the conditional expectation \(\mathbf{E}\left[X\mid\mathcal{F}\right]\) of a generic random variable \(X\in L^{2}\left(\Omega;\mathbb{R}^{M}\right)\) may be a difficult task. However, we can help with some properties of \(\mathbf{E}\left[X\mid\mathcal{F}\right]\) and the operator \(\mathbf{E}\left[\cdot\mid\mathcal{F}\right]:L^{2}\left(\Omega;\mathbb{R}^{M}\right)\rightarrow L^{2}\left( \Omega_{\mathcal{F}};\mathbb{R}^{M}\right)\), referred to as the conditional expectation operator on \(L^{2}\left(\Omega;\mathbb{R}^{M}\right)\), given \(\mathcal{F}\).

Proposition 2.3 (Mean properties of the conditional expectation) We have \[\begin{equation} \int_{F}\mathbf{E}\left[X\mid \mathcal{F}\right]d\mathbf{P}_{\mid\mathcal{F}}=\int_{F}Xd\mathbf{P}, \tag{2.9} \end{equation}\] for every \(X\in L^{2}\left(\Omega;\mathbb{R}^{M}\right)\) and every \(F\in\mathcal{F}\). In particular, \[\begin{equation} \mathbf{E}\left[\mathbf{E}\left[X\mid \mathcal{F}\right]\right]=\mathbf{E}\left[X\right], \tag{2.10} \end{equation}\] for every \(X\in L^{2}\left(\Omega;\mathbb{R}^{M}\right)\).

Assume that the \(\sigma\)-algebra \(\mathcal{F}\) is generated by a \(N\)-variate real \(\mathcal{E}\)-random variable \(Y\), for some \(N\in\mathbb{N}\), that is \(\mathcal{F}\) is the smallest piece of information with respect to which the values taken by \(Y\) are observable, in symbols \(\mathcal{F}=\sigma\left(Y\right)\). In this case the conditional expectation of \(X\) given \(\mathcal{F}\) is typically denoted by \(\mathbf{E}\left[X\mid Y\right]\), rather than \(\mathbf{E}\left[X\mid \sigma\left(Y\right)\right]\), and in some cases it is not difficult to determine it explicitly.

Proposition 2.4 (Conditional expectation given a discrete random variable) Assume that the random variable \(Y\) is discrete, that is \(Y\left(\Omega\right)\equiv\left(y_{j}\right)_{j\in J}\), where \(J\subseteq\mathbb{N}\) and \(\mathbf{P}\left\{Y=y_{j}\right\}>0\), for every \(j\in J\). Then we have \[\begin{equation} \mathbf{E}\left[X\mid Y\right]=\sum_{j\in J}\left(\frac{1}{\mathbf{P}\left(F_{j}\right)}{\int_{F_{j}}Xd\mathbf{P}}\right) 1_{F_{j}}, \tag{2.11} \end{equation}\] where \(F_{j}\equiv\left\{Y=y_{j}\right\}\), for every \(j\in J\).

Proposition 2.5 (Conditional expectation as linear regression) Assume that \(X\) and \(Y\) are both univariate and the conditional expectation \(\mathbf{E}\left[X\mid Y\right]\) is linear in \(Y\), that is \(\mathbf{E}\left[X\mid Y\right]=a+bY\) for some \(a,b\in\mathbb{R}\). Then we have \[\begin{equation} \mathbf{E}\left[X\mid Y\right] =\mathbf{E}\left[X\right] +Corr(X,Y)\frac{\mathbf{D}\left[X\right]}{\mathbf{D}\left[Y\right]}\left(Y-\mathbf{E}\left[Y\right]\right), \tag{2.12} \end{equation}\] where \(\mathbf{D}\left[\cdot\right]\) [resp. \(Corr(\cdot,\cdot)\)] is the standard deviation [resp. correlation] functional.

Proposition 2.6 (Conditional expectation of jointly Gaussian random variables) Assume that \(X\) and \(Y\) are both univariate and jointly Gaussian. Then the conditional expectation \(\mathbf{E}\left[X\mid Y\right]\) is linear in \(Y\), that is Equation (2.12) holds true. Moreover, if also \(X^2\in L^{2}\left(\Omega;\mathbb{R}\right)\), we have \[\begin{equation} \mathbf{E}\left[X^{2}\mid Y\right] =\mathbf{D}^{2}\left[X\right]\left(1-Corr(X,Y)^{2}\right) +\left(\mathbf{E}\left[X\right]+Corr(X,Y)\frac{\mathbf{D}\left[X\right]}{\mathbf{D}\left[Y\right]} \left(Y-\mathbf{E}\left[Y\right]\right)\right)^{2}, \tag{2.13} \end{equation}\] where \(\mathbf{D}\left[\cdot\right]\) [resp. \(Corr(\cdot,\cdot)\)] is the standard deviation [resp. correlation] functional.

Proposition 2.7 (Properties of the conditional expectation operator) The conditional expectation operator has the following properties

  1. linearity: we have \[\begin{equation} \mathbf{E}\left[\alpha_{1}X_{1} + \alpha_{2}X_{2}\mid\mathcal{F}\right] =\alpha_{1}\mathbf{E}\left[X_{1}\mid\mathcal{F}\right] + \alpha_{2}\mathbf{E}\left[X_{2}\mid\mathcal{F}\right], \tag{2.14} \end{equation}\] for all \(\alpha_{1},\alpha_{2}\in\mathbb{R}\) and all \(X_{1},X_{2}\in L^{2}\left(\Omega;\mathbb{R}^{M}\right)\);

  2. invariance: we have \[\begin{equation} \mathbf{E}\left[X\mid\mathcal{F}\right]=X, \tag{2.15} \end{equation}\] for every \(X\in L^{2}\left(\Omega_{\mathcal{F}};\mathbb{R}^{M}\right)\);

  3. concentration: we have \[\begin{equation} \mathbf{E}\left[X\mid\mathcal{F}\right]=\mathbf{E}\left[X\right], \tag{2.16} \end{equation}\] for every \(X\in L^{2}\left(\Omega;\mathbb{R}^{N}\right)\) independent of \(\mathcal{F}\);

  4. transparence: we have \[\begin{equation} \mathbf{E}\left[XY^{\intercal}\mid\mathcal{F}\right]=X\mathbf{E}\left[Y^{\intercal}\mid\mathcal{F}\right] \qquad\text{[resp. } \mathbf{E}\left[XY^{\intercal}\mid\mathcal{F}\right]=\mathbf{E}\left[X\mid\mathcal{F}\right]Y^{\intercal} \text{]}, \tag{2.17} \end{equation}\] for every \(X\in L^{2}\left(\Omega_{\mathcal{F}};\mathbb{R}^{M}\right)\) and every \(Y\in L^{2}\left(\Omega;\mathbb{R}^{N}\right)\) [resp. for every \(X\in L^{2}\left(\Omega;\mathbb{R}^{M}\right)\) and every \(Y\in L^{2}\left(\Omega_{\mathcal{F}};\mathbb{R}^{N}\right)\)], in particular, \[\begin{equation} \mathbf{E}\left[\mathbf{E}\left[X\mid\mathcal{F}\right]Y^{\intercal}\mid\mathcal{F}\right] =\mathbf{E}\left[X\mid\mathcal{F}\right]\mathbf{E}\left[Y^{\intercal}\mid\mathcal{F}\right] \qquad\text{[resp. } \mathbf{E}\left[X\mathbf{E}\left[Y^{\intercal}\mid\mathcal{F}\right]\mid\mathcal{F}\right] =\mathbf{E}\left[X\mid\mathcal{F}\right]\mathbf{E}\left[Y^{\intercal}\mid\mathcal{F}\right], \tag{2.18} \end{equation}\] for every \(X\in L^{2}\left(\Omega;\mathbb{R}^{M}\right)\) and every \(Y\in L^{2}\left(\Omega;\mathbb{R}^{N}\right)\);

  5. law of iterated expectations (LIE): we have \[\begin{equation} \mathbf{E}\left[\mathbf{E}\left[X\mid\mathcal{F}\right]\mid \mathcal{G}\right] =\mathbf{E}\left[\mathbf{E}\left[X\mid\mathcal{G}\right]\mid\mathcal{F}\right] =\mathbf{E}\left[X\mid\mathcal{G}\right], \tag{2.19} \end{equation}\] for every \(X\in L^{2}\left(\Omega_{\mathcal{F}};\mathbb{R}^{N}\right)\) and every \(\sigma\)-algebra \(\mathcal{G}\subseteq\mathcal{F}\);

  6. Jensen inequality: we have \[\begin{equation} \mathbf{E}\left[\phi\left(X\right)\mid\mathcal{F}\right] \geq\phi\left(\mathbf{E}\left[X\mid\mathcal{F}\right]\right), \tag{2.19} \end{equation}\] for every \(X\in L^{2}\left(\Omega;\mathbb{R}^{N}\right)\) and every convex function \(\phi:\mathbb{R}^{N}\rightarrow\mathbb{R}\) such that \(\phi\circ X\in L^{2}\left(\Omega;\mathbb{R}\right)\), in particular, \[\begin{equation} \mathbf{E}\left[\left\vert X\right\vert\mid\mathcal{F}\right] \geq\left\vert \mathbf{E}\left[X\mid\mathcal{F}\right]\right\vert, \tag{2.20} \end{equation}\] for every \(X\in L^{2}\left(\Omega;\mathbb{R}^{N}\right)\), and \[\begin{equation} \mathbf{E}\left[X\mid\mathcal{F}\right]\geq 0, \tag{2.21} \end{equation}\] for every \(X\in L^{2}\left(\Omega;\mathbb{R}\right)\) such that \(X\geq 0\).

Note that under the assumptions considered for the transparence property 4. we have that \(XY^{\intercal}\in L^{1}(\Omega;\mathbb{R}^{M\times N})\) and \(X\mathbf{E}\left[Y^{\intercal}\mid\mathcal{F}\right]\in L^{1}(\Omega_{\mathcal{F}};\mathbb{R}^{M\times N})\) but, in general, neither \(XY^{\intercal}\in L^{2}(\Omega;\mathbb{R}^{M\times N})\) nor \(X\mathbf{E}\left[Y^{\intercal}\mid\mathcal{F}\right]\in L^{2}(\Omega_{\mathcal{F}};\mathbb{R}^{M\times N})\). Thus, in terms of our definition of conditional expectation, to obtain the result presented in Equation (2.17) it might appear necessary adding the hypotheses \(XY^{\intercal}\in L^{2}(\Omega;\mathbb{R}^{M\times N})\) and \(X\mathbf{E}\left[Y^{\intercal}\mid\mathcal{F}\right]\in L^{2}(\Omega_{\mathcal{F}};\mathbb{R}^{M\times N})\). However, it is possible to prove that the conditional expectation operator can be extended to \(L^{1}(\Omega;\mathbb{R}^{M\times N})\) in such a way that Equation (2.17) continues to be valid without additional hypotheses. An analogous consideration applies to Equation (2.18).

Corollary 2.1 (Further properties of the conditional expectation operator) we have \[\begin{equation} \mathbf{E}\left[XY^{\intercal}\mid\mathcal{F}\right] =\mathbf{E}\left[X\right]\mathbf{E}\left[Y^{\intercal}\mid\mathcal{F}\right] \qquad\text{[resp. } \mathbf{E}\left[XY^{\intercal}\mid\mathcal{F}\right] =\mathbf{E}\left[X\mid\mathcal{F}\right]\mathbf{E}\left[Y\right]^{\intercal} \text{]}, \tag{2.22} \end{equation}\] for every \(X\in L^{2}\left(\Omega;\mathbb{R}^{M}\right)\) independent of \(\mathcal{F}\vee\sigma(Y)\) and every \(Y\in L^{2}\left(\Omega;\mathbb{R}^{N}\right)\) [resp. for every \(X\in L^{2}\left(\Omega;\mathbb{R}^{M}\right)\) and every \(Y\in L^{2}\left(\Omega;\mathbb{R}^{N}\right)\) independent of \(\mathcal{F}\vee\sigma(X)\)], where \(\mathcal{F}\vee\sigma(Y)\) [resp. \(\mathcal{F}\vee\sigma(X)\)] is the \(\sigma\)-algebra generated by \(\mathcal{F}\) and \(\sigma(Y)\) [resp. \(\sigma(X)\)].

Proof. By virtue of the LIE property (see Equation (2.19)), we can write \[\begin{equation} \mathbf{E}\left[XY^{\intercal}\mid\mathcal{F}\right] =\mathbf{E}\left[\mathbf{E}\left[XY^{\intercal}\mid\mathcal{F}\right]\mid\mathcal{F}\vee\sigma(Y)\right] =\mathbf{E}\left[\mathbf{E}\left[XY^{\intercal}\mid\mathcal{F}\vee\sigma(Y)\right]\mid\mathcal{F}\right]. \tag{2.23} \end{equation}\] On the other hand, since \(Y\) is clearly \(\left(\mathcal{F}\vee\sigma(Y)\right)\)-measurable, by virtue of the transparence property (see Equation (2.17)) we have \[\begin{equation} \mathbf{E}\left[\mathbf{E}\left[XY^{\intercal}\mid\mathcal{F}\vee\sigma(Y)\right]\mid\mathcal{F}\right] =\mathbf{E}\left[\mathbf{E}\left[X\mid\mathcal{F}\vee\sigma(Y)\right]Y^{\intercal}\mid\mathcal{F}\right]. \tag{2.24} \end{equation}\] Now, since \(X\) is independent of \(\mathcal{F}\vee\sigma(Y)\), thanks to the concentration and linearity property (see Equations (2.16) and (2.14)), we have \[\begin{equation} \mathbf{E}\left[\mathbf{E}\left[X\mid\mathcal{F}\vee\sigma(Y)\right]Y^{\intercal}\mid\mathcal{F}\right] =\mathbf{E}\left[\mathbf{E}\left[X\right]Y^{\intercal}\mid\mathcal{F}\right] =\mathbf{E}\left[X\right]\mathbf{E}\left[Y^{\intercal}\mid\mathcal{F}\right] \tag{2.25} \end{equation}\] Combining Equations (2.24)-(2.24) we obtain the first part of Equation (2.22). The proof of the second part is perfectly analogous.

Note that for the validity of Equation (2.22) it is not sufficient to assume that \(X\) [resp. \(Y\)] is independent of both \(\mathcal{F}\) and \(\sigma(Y)\) [resp. \(\sigma(X)\)]. Eventually, the independence of \(X\) [resp. \(Y\)] from \(\mathcal{F}\vee\sigma(Y)\) [resp. \(\mathcal{F}\vee\sigma(X)\)] is a stronger assumption.

Corollary 2.2 (Further properties of the conditional expectation operator) We have \[\begin{equation} \mathbf{D}^{2}\left[\mathbf{E}\left[X\mid\mathcal{F}\right]\right] =\mathbf{E}\left[ \mathbf{E}\left[ X\mid\mathcal{F}\right]^{2}\right]-\mathbf{E}\left[X\right]^{2}. \tag{2.26} \end{equation}\] for every \(X\in L^{2}\left(\Omega;\mathbb{R}\right)\). As a consequence, assuming also \(X^{2}\in L^{2}\left(\Omega;\mathbb{R}\right)\), we have \[\begin{equation} \mathbf{D}^{2}\left[\mathbf{E}\left[X\mid\mathcal{F}\right]\right]\leq\mathbf{D}^{2}\left[X\right] \tag{2.27} \end{equation}\]

Proof. On account of (2.10), a straightforward computation gives \[\begin{equation} \mathbf{D}^{2}\left[\mathbf{E}\left[X\mid\mathcal{F}\right]\right] =\mathbf{E}\left[\mathbf{E}\left[X\mid\mathcal{F}\right]^{2}\right] -\mathbf{E}\left[\mathbf{E}\left[X\mid\mathcal{F}\right]\right]^{2} =\mathbf{E}\left[\mathbf{E}\left[X\mid\mathcal{F}\right]^{2}\right] -\mathbf{E}\left[X\right]^{2} \end{equation}\] This proves (2.26). Now, applying (2.19), referred to the function \(\phi\left(x\right)\overset{\text{def}}{=}x^{2}\), and (2.10), we obtain \[\begin{equation} \mathbf{E}\left[\mathbf{E}\left[X\mid\mathcal{F}\right]^{2}\right] \leq\mathbf{E}\left[\mathbf{E}\left[X^{2}\mid\mathcal{F}\right]\right] =\mathbf{E}\left[X^{2}\right]. \tag{2.28} \end{equation}\] and the desired (2.27) immediately follows combining (2.26) and (2.28).

Corollary 2.3 (Orthogonality of the conditional expectation operator) We have also \[\begin{equation} Cov\left(X-\mathbf{E}\left[X\mid\mathcal{F}\right],Y\right)=0, \tag{2.29} \end{equation}\] for every \(X\in L^{2}\left(\Omega;\mathbb{R}^{N}\right)\) and every \(Y\in L^{2}(\Omega_{\mathcal{F}};\mathbb{R}^{N})\), where \(Cov(\cdot,\cdot)\) is the covariance functional. In particular, \[\begin{equation} Cov\left(X-\mathbf{E}\left[X\mid\mathcal{F}\right],\mathbf{E}\left[X\mid\mathcal{F}\right]\right)=0. \tag{2.29} \end{equation}\]

Proof. Considering Equations (2.14) and (2.10), we can write \[\begin{align} Cov\left(X-\mathbf{E}\left[X\mid\mathcal{F}\right],Y\right) & =\mathbf{E}\left[\left(X-\mathbf{E}\left[X\mid\mathcal{F}\right]\right)Y^{\intercal}\right] -\mathbf{E}\left[X-\mathbf{E}\left[X\mid\mathcal{F}\right]\right]\mathbf{E}\left[Y^{\intercal}\right] \\ & =\mathbf{E}\left[XY^{\intercal}-\mathbf{E}\left[X\mid\mathcal{F}\right]Y^{\intercal}\right] -\left(\mathbf{E}\left[X\right]-\mathbf{E}\left[\mathbf{E}\left[X\mid\mathcal{F}\right]\right]\right) \mathbf{E}\left[Y^{\intercal}\right] \\ & =\mathbf{E}\left[XY^{\intercal}\right]-\mathbf{E}\left[\mathbf{E}\left[XY^{\intercal}\mid\mathcal{F}\right]\right] -\left(\mathbf{E}\left[X\right]-\mathbf{E}\left[X\right]\right)\mathbf{E}\left[Y^{\intercal}\right] \\ & =\mathbf{E}\left[XY^{\intercal}\right]-\mathbf{E}\left[XY^{\intercal}\right] \\ & =0. \end{align}\]

Property expressed by Equation (2.29) characterizes the conditional expectation operator \(\mathbf{E}\left[\cdot\mid\mathcal{F}\right]:L^{2}\left(\Omega;\mathbb{R}^{N}\right)\rightarrow L^{2}\left(\Omega_{\mathcal{F}};\mathbb{R}^{N}\right)\) as the orthogonal projection of \(L^{2}\left(\Omega;\mathbb{R}^{N}\right)\) on the subspace \(L^{2}\left(\Omega_{\mathcal{F}};\mathbb{R}^{N}\right)\).

3 Models for Time Series

Let \(\left(\Omega,\mathcal{E},\mathbf{P}\right)\equiv\Omega\) be a probability space and let \(\mathbb{T}\) be a non-empty subset of the real line \(\mathbb{R}\).

Definition 3.1 (Stochastic Process) We call an \(N\)-variate real stochastic process on \(\Omega\) with time set \(\mathbb{T}\) and state space \(\mathbb{R}^{N}\) any family \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{X}\) of \(N\)-variate real random variables on \(\Omega\). More specifically, when \(\mathbb{T}\) is a subset of \(\mathbb{Z}\) [resp. an interval of \(\mathbb{R}\)], we speak of discrete-time [resp. continuous-time] stochastic process; when \(N=1\) we speak of real stochastic process.

Definition 3.2 (Sample Path) For any \(\omega\in\Omega\), we call the \(\omega\)-sample path of the stochastic process \(\mathbf{X}\) the family \(\left(x_{t}\right)_{t\in\mathbb{T}}\) of points in \(\mathbb{R}^{N}\) given by \[\begin{equation} x_{t}\overset{\text{def}}{=}X_{t}\left(\omega\right), \quad\forall t\in\mathbb{T}. \end{equation}\] The sample paths of a stochastic process are also called the trajectories or realizations of the process.

Some comments are in order. When dealing with a random phenomenon, we call the sample space of the random phenomenon, denoted by \(\Omega\), the set of all possible outcomes of the random phenomenon. We call an event any set of outcomes, that is any subset of the sample space \(\Omega\). Thereafter, we consider a suitable family of events \(\mathcal{E}\), a \(\sigma\)-algebra of events, to represent the available information on the random phenomenon, and introduce a normalized measure of the possibility of occurrence of the events in \(\mathcal{E}\), called probability and denoted by \(\mathbf{P}:\mathcal{E}\rightarrow\mathbb{R}_{+}\). The triple of the the sample space, the family of events, and the probability, in symbols \(\left(\Omega,\mathcal{E},\mathbb{P}\right)\equiv\Omega\), constitutes a probability space. When dealing with a stochastic phenomenon, the terminology does not change, but the objects of the terminology somehow escalate to account of the role of time. For instance, when considering the flip of a coin we know that we have two possible outcomes, head or tail. This lead us to introduce the sample space \(\Omega\equiv\{1,0\}\) and the family of all possible events \(\mathcal{E}\equiv\mathcal{P}\left(\Omega\right)\equiv\left\{\Omega,\mathbb{\varnothing},\left\{1\right\},\left\{0\right\}\right\}\). On the other hand, if we repeat \(T\) times the flip of a coin, for some \(T\in\mathbb{N}\), then the set of all possible outcomes becomes the set of all ordered \(T\)-ples of numbers \(1\) and \(0\). Hence, we are led to introduce the sample space \[\begin{equation} \Omega\equiv\left\{\omega\equiv\left(\omega_{1},\dots,\omega_{n}\right): \omega_{t}\in\left\{1,0\right\},\quad\forall t=1,\dots,T\right\} \end{equation}\] Therefore, the family of all possible events \(\mathcal{E}\equiv\mathcal{P}\left(\Omega\right)\) is now made by \(2^T\) distinct events. Now, fixed any \(t=1,\dots,T\), consider the random variable \(X_{t}:\Omega\rightarrow\mathbb{R}\) given by \[\begin{equation} X_{t}\left(\omega\right)\overset{\text{def}}{=} \left\{ \begin{array} [c]{cc} 1, & \text{if the } t\text{th entry of }\omega \text{ is } 1,\\ 0, & \text{if the } t\text{th entry of }\omega \text{ is } 0,\\ \end{array} \right. \quad\forall\omega\in\Omega. \end{equation}\] and consider the stochastic process \(\left(X_{t}\right)_{t=1}^{T}\equiv \mathbf{X}\). It is clearly seen that for any \(\omega\in\Omega\) the \(\omega\)-sample path of the process \(X\) is an ordered \(T\)-ple of numbers \(1\) and \(0\). Thereby, the outcomes of the sample space of the repeated flip of a coin are naturally identified with the sample paths of the process \(\mathbf{X}\). Such a situation is rather general: the outcomes of the sample space of a stochastic phenomenon can always be naturally identified with the sample paths of a particular stochastic process on the sample space. This process is a generalization of the process \(\mathbf{X}\) presented above, referred to as coordinate process.

Let \(\left(x_{t}\right)_{t=1}^{T}\equiv\mathbf{x}\) an \(N\)-variate real time series and let \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{X}\) an \(N\)-variate real stochastic process on a probability space \(\Omega\).

Definition 3.3 (Stochastic Process and Time Series) We say that \(\mathbf{X}\) is a model for \(\mathbf{x}\) if the time set \(\mathbb{T}\) of \(X\) contains \(\left\{1,\dots,T\right\}\) and the time series \(\mathbf{x}\) may be thought as the restriction to \(\left\{1,\dots,T\right\}\) of a sample path of the process. In symbols, \[\begin{equation} x_{t}=X_{t}\left(\omega\right), \end{equation}\] for some \(\omega\in\Omega\) and every \(t\in\left\{1,\dots,T\right\}\).

The Time Series Analysis is a collection of techniques which allow to infer the structure of a stochastic process which might be a good model for a time series. A good model allows

  1. the interpretation of the time series, that is the specification of the role played by various variables in the evolution of the time series;

  2. the forecasting, that is the statistical prediction of the future evolution of time series.

Moreover, without a good model it is not possible to achieve more sophisticated tasks such as

  1. the simulation, that is the statistical description of future scenarios related to the evolution of the time series (think on the prediction of major earthquakes following a pattern of seismic activity);

  2. the control, that is the statistical prediction of the influence that a policy on some variables might exert on the evolution of the time series (think of the effect that the monetary policy of a central bank might exert on the economy growth);

  3. the hypothesis testing, that is the statistical reliability of some conjectures related to the evolution of the time series (think of the confirmation or refutation of the global warming conjecture).

While inferring a model for a time series is rather easy (any polynomial \(P:\mathbb{R}\rightarrow\mathbb{R}^{N}\) of degree not lower than \(T\) such that \(P\left(t\right)=x_{t}\) for every \(t\in\{1,\dots,T\}\), is a model for the time series \(X\)), inferring a good model may be a very difficult task. This difficulty is due to the fact that the inference has to be based on the analysis of few sample paths, typically a single one, of the stochastic process which we aim to candidate as a good model.

Example 3.1 (Doctor Strange's Simulation Procedure - Avengers) A worldwide famous data scientist focused on a simulation and control procedure

As an early concrete example of stochastic process and sample paths, we consider a more formal approach to a random walk.

Let \(\left(X_{t}\right)_{t\in\mathbb{N}_{0}}\equiv\mathbf{X}\) be a stochastic process on a probability space \(\Omega\), with states in \(\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\), and time set \(\mathbb{N}_{0}\), let \(\alpha,\beta\in\mathbb{R}^{N}\), and let \(\left(Z_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{Z}\) be a sequence of independent and identically distributed random variables on \(\Omega\) with states in \(\mathbb{R}^{N}\) and mean \(0\). Clearly, \(\mathbf{Z}\) is itself a stochastic process with time set \(\mathbb{N}\) and state space \(\mathbb{R}^{N}\). In the sequel, we will call \(\mathbf{Z}\) a strong white noise. We assume that the initial state \(X_{0}\) of the process \(\mathbf{X}\) is independent of the process \(\mathbf{Z}\). Furthermore, we often assume that the initial state \(X_{0}\) is a Dirac random variable concentrated at \(x_{0}\), for some \(x_{0}\in\mathbb{R}^{N}\).

Example 3.2 (Random Walk) We say that \(\mathbf{X}\) is a random walk starting from \(X_{0}\) with drift coefficient \(\alpha\), linear trend coefficient \(\beta\), and state innovation \(\mathbf{Z}\), if the random variables in \(\mathbf{X}\) satisfy the equation \[\begin{equation} X_{t}=\alpha + \beta t + X_{t-1} + Z_{t}, \tag{3.1} \end{equation}\] for every \(t\in\mathbb{N}\).

When the random variables in \(\mathbf{Z}\) are Rademacher [resp. Gaussian] distributed, the random walk \(\mathbf{X}\), referred to as Rademacher [resp. Gaussian] random walk, has many important applications in modeling.

We cannot draw the full graphical representation of a random walk with drift and linear trend, but we can draw the graphical representation of some, even several, paths. We consider the Gaussian random walk.

t <- seq(from=-0.49, to=1.00, length.out=150)  # Choosing the time set so that we can think of a past time
                                               # before 0, a present time at 0, and a future time after 0.
a <- 0.05                                      # Choosing the drift coefficient. 
b <- 0.01                                      # Choosing the linear trend coefficient.
x0 <- 0                                        # Choosing the starting point of the GRW.

set.seed(12345, kind=NULL, normal.kind=NULL)   # Setting a random seed for reproducibility.

Gauss_r <- rnorm(n=150, mean=0, sd=9)          # Determining one of the possible values of the Gaussian 
                                               # random variables in the state innovation process and 
                                               # showing the first 30 values taken by the Gaussian random 
                                               # variables in the state innovation process (sample path of 
head(Gauss_r, 30)                              # the state innovation).
##  [1]   5.2697594   6.3851942  -0.9837298  -4.0814746   5.4529871 -16.3616037
##  [7]   5.6708870  -2.4856569  -2.5574377  -8.2738980  -1.0462303  16.3558084
## [13]   3.3356508   4.6819481  -6.7547880   7.3520986  -7.9772177  -2.9841983
## [19]  10.0864139   2.6885133   7.0165973  13.1020657  -5.7989559 -13.9782366
## [25] -14.3793857  16.2458777  -4.3348263   5.5834182   5.5091114  -1.4607988
x_r <- rep(NA,150)                             # Setting an empty vector of length 150 to store 
                                               # the sample path of the Gaussian random walk (GRW), 
                                               # corresponding to the sample path of the state innovation.
x_r[1] <- a + b*t[1] + x0 + Gauss_r[1]         # Determining the first point (after the starting point) 
                                               # of the sample path of the GRW. 

for (n in 2:150)                               # Determining the other points of the sample path of the GRW
{x_r[n] <- a + b*t[n] + x_r[n-1] + Gauss_r[n]} 

x_r_bis <- cumsum(a + b*t + Gauss_r)           # Alternative command to build the sample path.

all(round(x_r-x_r_bis, digits=12)==0)          # Checking the alternative command.
## [1] TRUE
head(x_r)                                      # Showing the first points of the sample path of the GRW.
## [1]  5.314859 11.745254 10.806824  6.770749 12.269236 -4.046767
tail(x_r)                                      # showing the last points of the sample path of the GRW.
## [1] 214.5137 219.4348 205.5689 213.2756 221.3996 222.7078
set.seed(23451, kind=NULL, normal.kind=NULL)  # Setting another random seed for reproducibility 
                                              # to build another sample path of the GRW.

Gauss_g <- replace(Gauss_r, c(51:150), rnorm(n=100, mean=0, sd=9)) # Building another sample path of the  
                                                                   # Gaussian state innovation process,  
                                                                   # which retains the first 50 sample points 
                                                                   # of the former path.

head(Gauss_g, 30)                             # Showing the sample path.
##  [1]   5.2697594   6.3851942  -0.9837298  -4.0814746   5.4529871 -16.3616037
##  [7]   5.6708870  -2.4856569  -2.5574377  -8.2738980  -1.0462303  16.3558084
## [13]   3.3356508   4.6819481  -6.7547880   7.3520986  -7.9772177  -2.9841983
## [19]  10.0864139   2.6885133   7.0165973  13.1020657  -5.7989559 -13.9782366
## [25] -14.3793857  16.2458777  -4.3348263   5.5834182   5.5091114  -1.4607988
x_g <- cumsum(a + b*t + Gauss_g)              # Building the corresponding sample path of the GRW.
head(x_g)
## [1]  5.314859 11.745254 10.806824  6.770749 12.269236 -4.046767
tail(x_g)
## [1] 129.1990 134.4412 130.0406 131.9856 136.1327 134.5651
set.seed(34512, kind=NULL, normal.kind=NULL)
Gauss_b <- replace(Gauss_r, c(51:150), rnorm(n=100, mean=0, sd=9))
head(Gauss_b, 30)
##  [1]   5.2697594   6.3851942  -0.9837298  -4.0814746   5.4529871 -16.3616037
##  [7]   5.6708870  -2.4856569  -2.5574377  -8.2738980  -1.0462303  16.3558084
## [13]   3.3356508   4.6819481  -6.7547880   7.3520986  -7.9772177  -2.9841983
## [19]  10.0864139   2.6885133   7.0165973  13.1020657  -5.7989559 -13.9782366
## [25] -14.3793857  16.2458777  -4.3348263   5.5834182   5.5091114  -1.4607988
x_b <- cumsum(a + b*t + Gauss_b)
head(x_b)
## [1]  5.314859 11.745254 10.806824  6.770749 12.269236 -4.046767
tail(x_b)
## [1] 147.2113 154.1321 155.6461 155.6430 153.1577 145.6205
Gauss_rw_df <- data.frame(t,x_r,x_g,x_b) # Generating a data frame from the time variable 
                                         # and the three paths of the GRW.
head(Gauss_rw_df)
##       t       x_r       x_g       x_b
## 1 -0.49  5.314859  5.314859  5.314859
## 2 -0.48 11.745254 11.745254 11.745254
## 3 -0.47 10.806824 10.806824 10.806824
## 4 -0.46  6.770749  6.770749  6.770749
## 5 -0.45 12.269236 12.269236 12.269236
## 6 -0.44 -4.046767 -4.046767 -4.046767
tail(Gauss_rw_df)
##        t      x_r      x_g      x_b
## 145 0.95 214.5137 129.1990 147.2113
## 146 0.96 219.4348 134.4412 154.1321
## 147 0.97 205.5689 130.0406 155.6461
## 148 0.98 213.2756 131.9856 155.6430
## 149 0.99 221.3996 136.1327 153.1577
## 150 1.00 222.7078 134.5651 145.6205
show(Gauss_rw_df[45:55,])
##        t      x_r       x_g       x_b
## 45 -0.05 83.91051  83.91051  83.91051
## 46 -0.04 97.10667  97.10667  97.10667
## 47 -0.03 84.43849  84.43849  84.43849
## 48 -0.02 89.59492  89.59492  89.59492
## 49 -0.01 94.89350  94.89350  94.89350
## 50  0.00 83.18231  83.18231  83.18231
## 51  0.01 78.36894  94.21468  99.58635
## 52  0.02 95.94837  98.86937 104.60110
## 53  0.03 96.48099 101.55969 114.65378
## 54  0.04 99.69635 105.31004 115.88370
## 55  0.05 93.70806  85.62532 107.11416
# library(dplyr)
Gauss_rw_df <- add_row(Gauss_rw_df,  t=-0.50, x_r=0, x_g=0, x_b=0, .before=1) # Adding a row to represent  
                                                                              # the starting point of the GRW.
Gauss_rw_df <- add_column(Gauss_rw_df,  n=1:nrow(Gauss_rw_df), .before=t)     # Adding a column to represent  
                                                                              # the index set.
head(Gauss_rw_df)
##   n     t       x_r       x_g       x_b
## 1 1 -0.50  0.000000  0.000000  0.000000
## 2 2 -0.49  5.314859  5.314859  5.314859
## 3 3 -0.48 11.745254 11.745254 11.745254
## 4 4 -0.47 10.806824 10.806824 10.806824
## 5 5 -0.46  6.770749  6.770749  6.770749
## 6 6 -0.45 12.269236 12.269236 12.269236

We plot the paths of the Gaussian random walk with drift and linear trend. First, the scatter Plot

# library(ggplot2)
Data_df <- Gauss_rw_df
length <- nrow(Data_df)
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Scatter Plot of Three Sample Paths of a Gaussian Random Walk with Drift and Linear"))
subtitle_content <- bquote(atop(paste("path length ", .(length), " sample points,    starting point ", x[0]==0, 
                                      ",    drift par. ", alpha==.(a), ",  linear trend par. ", beta==.(b),","),
                                paste("noise random seeds ", 12345, ", " , 23451, ", " , 34512, 
                                      ",    noise mean par. ", mu==0, ",    noise var. par. ", sigma^2==9,".")))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
x_breaks_num <- 15
x_breaks_low <- Data_df$n[1]
x_breaks_up <- Data_df$n[length]
x_binwidth <- ceiling((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- c(x_breaks_low,seq(from=x_binwidth+1, to=x_breaks_up, by=x_binwidth))
if((max(x_breaks)-x_breaks_up)>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- format(Data_df$t[x_breaks], scientific=FALSE)
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth, x_breaks_up+J*x_binwidth)
y_name <- bquote(~ x[t] ~ "values")
y_breaks_num <- 10
y_max <- max(Data_df$x_r,Data_df$x_b,Data_df$x_g)
y_min <- min(Data_df$x_r,Data_df$x_b,Data_df$x_g)
y_binwidth <- round((y_max-y_min)/y_breaks_num, digits=3)
y_breaks_low <- floor(y_min/y_binwidth)*y_binwidth
y_breaks_up <- ceiling(y_max/y_binwidth)*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),digits=3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_k <- bquote("random seed" ~  12345)
col_r <- bquote("random seed" ~  12345)
col_g <- bquote("random seed" ~  23451)
col_b <- bquote("random seed" ~  34512)
leg_labs <- c(col_k, col_r, col_g, col_b)
leg_breaks <- c("col_k", "col_r", "col_g", "col_b")
leg_cols <- c("col_k"="black", "col_r"="red", "col_g"="green3", "col_b"="blue")
Gauss_rw_df_sp <- ggplot(Data_df, aes(x=n)) + 
  geom_hline(yintercept = 0, size=0.3, colour="black") +
  geom_vline(xintercept = which(Data_df$t==0), size=0.3, colour="black") +
  geom_point(data=subset(Data_df, Data_df$t > 0), alpha=1, size=1.5, shape=19, aes(y=x_b, color="col_b")) +
  geom_point(data=subset(Data_df, Data_df$t > 0), alpha=1, size=1.5, shape=19, aes(y=x_g, color="col_g")) +
  geom_point(data=subset(Data_df, Data_df$t > 0), alpha=1, size=1.5, shape=19, aes(y=x_r, color="col_r")) +
  geom_point(data=subset(Data_df, Data_df$t <= 0), alpha=1, size=1.5, shape=19, aes(y=x_r, color="col_k")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(Gauss_rw_df_sp)

Second, the line plot

title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Line Plot of Three Sample Paths of a Gaussian Random Walk with Drift and Linear Trend"))
Gauss_rw_df_lp <- ggplot(Data_df, aes(x=n)) + 
  geom_hline(yintercept = 0, size=0.3, colour="black") +
  geom_vline(xintercept = which(Data_df$t==0), size=0.3, colour="black") +
  geom_line(data=subset(Data_df, Data_df$t >= 0), alpha=1, size=0.7, group=1, aes(y=x_b, color="col_b")) +
  geom_line(data=subset(Data_df, Data_df$t >= 0), alpha=1, size=0.7, group=1, aes(y=x_g, color="col_g")) +
  geom_line(data=subset(Data_df, Data_df$t >= 0), alpha=1, size=0.7, group=1, aes(y=x_r, color="col_r")) +
  geom_line(data=subset(Data_df, Data_df$t <= 0), alpha=1, size=1, group=1, aes(y=x_r, color="col_k")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(Gauss_rw_df_lp)

What shown above is similar to the way Doctor Strange sees the future: pathwise. However, we are not Doctor Strange, so, what we can do? We can try to determine prediction bands at the confidence level \((1-\alpha)\), within which we can be confident at \((1-\alpha)\%\) to find the future paths of the Gaussian random walk, for any \(\alpha\in\left(0,1\right)\)

# We consider the path of the Gaussian random walk from the past time up to the present time m.
n <- nrow(Gauss_rw_df)
m <- 50
f <- 25
y <- Gauss_rw_df$x_r[c(1:m+1)]
# we model y as a random walk with drift and linear trend.
y_ARIMA <- Arima(y, order=c(0,1,0), include.constant = TRUE, include.drift = TRUE, method="ML")
# library(forecast)
# We forecast the future path and build the forecast intervals
y_ARIMA_for <- forecast::forecast(y_ARIMA, h=f, level=c(0.90,0.95,0.99))
# We build a data frame to plot the forecasted path and the forecast intervals  by extracting the data 
# from the for_red_res_y object
y_ARIMA_for_point_mean <- as.vector(y_ARIMA_for$mean)
y_ARIMA_for_int  <- cbind(as.vector(y_ARIMA_for$lower[,3]),as.vector(y_ARIMA_for$lower[,2]),
                          as.vector(y_ARIMA_for$lower[,1]),as.vector(y_ARIMA_for$upper[,1]),
                          as.vector(y_ARIMA_for$upper[,2]),as.vector(y_ARIMA_for$upper[,3]))
Gauss_rw_for_df <- Gauss_rw_df
Gauss_rw_for_df <- add_column(Gauss_rw_for_df, 
                              x_r_point_mean_for=c(rep(NA,m),Gauss_rw_df$x_r[m+1],y_ARIMA_for_point_mean,rep(NA,n-(m+f+1))),
                              x_r_int_for_0_005=c(rep(NA,m),Gauss_rw_df$x_r[m+1],y_ARIMA_for_int[,1],rep(NA,n-(m+f+1))),
                              x_r_int_for_0_025=c(rep(NA,m),Gauss_rw_df$x_r[m+1],y_ARIMA_for_int[,2],rep(NA,n-(m+f+1))),
                              x_r_int_for_0_05=c(rep(NA,m),Gauss_rw_df$x_r[m+1],y_ARIMA_for_int[,3],rep(NA,n-(m+f+1))),
                              x_r_int_for_0_95=c(rep(NA,m),Gauss_rw_df$x_r[m+1],y_ARIMA_for_int[,4],rep(NA,n-(m+f+1))),
                              x_r_int_for_0_975=c(rep(NA,m),Gauss_rw_df$x_r[m+1],y_ARIMA_for_int[,5],rep(NA,n-(m+f+1))),
                              x_r_int_for_0_995=c(rep(NA,m),Gauss_rw_df$x_r[m+1],y_ARIMA_for_int[,6],rep(NA,n-(m+f+1))),
                              .after="x_r")
# In the end, we plot the full path, the forecasted path and the forecast intervals
Data_df <- Gauss_rw_for_df
length <- nrow(Data_df)
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Scatter Plot of Three Paths of a Gaussian Random Walk with Drift and Linear Trend"))
subtitle_content <- bquote(atop(paste("path length ", .(length), " sample points,    starting point ", x[0]==0, 
                                      ",    drift par. ", alpha==.(a), ",  linear trend par. ", beta==.(b),","),
                                paste("noise random seeds ", 12345, ", " , 23451, ", " , 34512, 
                                      ",    noise mean par. ", mu==0, ",    noise var. par. ", sigma^2==81,".")))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
x_breaks_num <- 15
x_breaks_low <- Data_df$n[1]
x_breaks_up <- Data_df$n[length]
x_binwidth <- ceiling((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- c(x_breaks_low,seq(from=x_binwidth+1, to=x_breaks_up, by=x_binwidth))
if((max(x_breaks)-x_breaks_up)>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- format(Data_df$t[x_breaks], scientific=FALSE)
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth, x_breaks_up+J*x_binwidth)
y_name <- bquote(~ x[t] ~ "values")
y_breaks_num <- 10
y_max <- max(Data_df$x_r,Data_df$x_g,Data_df$x_b,Data_df$x_r_int_for_0_995, na.rm=TRUE)
y_min <- min(Data_df$x_r,Data_df$x_g,Data_df$x_b,Data_df$x_r_int_for_0_005, na.rm=TRUE)
y_binwidth <- round((y_max-y_min)/y_breaks_num, digits=3)
y_breaks_low <- floor((y_min/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((y_max/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
line_k <- bquote("past path - r.s." ~  12345)
line_m <- bquote("predicted path")
line_g <- bquote("90% pred.int.")
line_b <- bquote("95% pred.int.")
line_r <- bquote("99% pred.int.")
leg_line_labs   <- c(line_k, line_m, line_g, line_b, line_r)
leg_line_breaks <- c("line_k", "line_m", "line_g", "line_b", "line_r")
leg_line_cols   <- c("line_k"="black", "line_m"="brown", "line_g"="green", "line_b"="blue", "line_r"="red")
# leg_line_types  <- c("line_k"="solid", "line_m"="solid", "line_r"="solid", "line_g"="solid", "line_b"="solid")

shape_r <- bquote("future path - r.s." ~  12345)
shape_g <- bquote("future path - r.s." ~  23451)
shape_b <- bquote("future path - r.s." ~  34512)

leg_shape_labs   <- c(shape_r, shape_g, shape_b)
leg_shape_breaks <- c("shape_r","shape_g", "shape_b")
leg_shape_cols   <- c("shape_r"="red", "shape_g"="green", "shape_b"="blue")
# leg_shape_types  <- c("shape_r"= 19, "shape_g"= 19, "shape_b"= 19)

fill_g <- bquote("90% pred. band")
fill_b <- bquote("95% pred. band")
fill_r <- bquote("99% pred. band")

leg_fill_labs <- c(fill_g, fill_b, fill_r)
leg_fill_breaks <- c("fill_g", "fill_b", "fill_r")
leg_fill_cols <- c("fill_g"="lightgreen", "fill_b"="cyan", "fill_r"="orangered")

leg_col_labs   <- c(leg_line_labs,leg_shape_labs)
leg_col_breaks <- c(leg_line_breaks,leg_shape_breaks)
leg_col_cols   <- c(leg_line_cols,leg_shape_cols)
# leg_col_types  <- c(leg_line_types,leg_shape_types)

Gauss_rw_lp <- ggplot(Data_df, aes(x=n)) + 
  geom_hline(yintercept = 0, size=0.3, colour="black") +
  geom_vline(xintercept = which(Data_df$t==0), size=0.3, colour="black") +
  geom_point(data=subset(Data_df, Data_df$t >= t[m+1]), aes(y=x_g, colour="shape_b"),
             shape=19, alpha=1, size=1.0) +
  geom_point(data=subset(Data_df, Data_df$t >= t[m+1]), aes(y=x_b, colour="shape_g"),
             shape=19, alpha=1, size=1.0) +
  geom_point(data=subset(Data_df, Data_df$t >= t[m+1]), aes(y=x_r, colour="shape_r"), 
             shape=19, alpha=1, size=1.0) +
  geom_line(data=subset(Data_df, Data_df$t <= t[m+1]), aes(y=x_r, color="line_k"),
            linetype="solid", alpha=1, size=1, group=1) +
  geom_line(data=subset(Data_df, (Data_df$t >= t[m+1] & Data_df$t <= t[m+f+1])),
            aes(y=x_r_point_mean_for, colour="line_m"), linetype="solid", alpha=1, size=1) +
  geom_line(data=subset(Data_df, (Data_df$t >= t[m+1] & Data_df$t <= t[m+f+1])),
           aes(y=x_r_int_for_0_005, colour="line_r"), linetype="solid", alpha=1, size=1) +
  geom_line(data=subset(Data_df, (Data_df$t >= t[m+1] & Data_df$t <= t[m+f+1])),
           aes(y=x_r_int_for_0_995, colour="line_r"), linetype="solid", alpha=1, size=1) +
  geom_line(data=subset(Data_df, (Data_df$t >= t[m+1] & Data_df$t <= t[m+f+1])),
            aes(y=x_r_int_for_0_025, colour="line_b"), linetype="solid", alpha=1, size=1) +
  geom_line(data=subset(Data_df, (Data_df$t >= t[m+1] & Data_df$t <= t[m+f+1])),
            aes(y=x_r_int_for_0_975, colour="line_b"), linetype="solid", alpha=1, size=1) +
  geom_line(data=subset(Data_df, (Data_df$t >= t[m+1] & Data_df$t <= t[m+f+1])), 
           aes(y=x_r_int_for_0_05, colour="line_g"), linetype="solid", alpha=1, size=1) +
  geom_line(data=subset(Data_df, (Data_df$t >= t[m+1] & Data_df$t <= t[m+f+1])), 
           aes(y=x_r_int_for_0_95, colour="line_g"), linetype="solid", alpha=1, size=1) +
  geom_ribbon(data=subset(Data_df, (Data_df$t >= t[m+1] & Data_df$t <= t[m+f+1])), alpha=0.3, 
              aes(ymin=x_r_int_for_0_025, ymax=x_r_int_for_0_05, fill="fill_b")) +
  geom_ribbon(data=subset(Data_df, (Data_df$t >= t[m+1] & Data_df$t <= t[m+f+1])), alpha=0.3,
              aes(ymin=x_r_int_for_0_95, ymax=x_r_int_for_0_975, fill="fill_b")) +
  geom_ribbon(data=subset(Data_df, (Data_df$t >= t[m+1] & Data_df$t <= t[m+f+1])), alpha=0.3,
              aes(ymin=x_r_int_for_0_005, ymax=x_r_int_for_0_025, fill="fill_r")) +
  geom_ribbon(data=subset(Data_df, (Data_df$t >= t[m+1] & Data_df$t <= t[m+f+1])), alpha=0.3,
              aes(ymin=x_r_int_for_0_975, ymax=x_r_int_for_0_995, fill="fill_r")) +
  geom_ribbon(data=subset(Data_df, (Data_df$t >= t[m+1] & Data_df$t <= t[m+f+1])), alpha=0.3,
              aes(ymin=x_r_int_for_0_05, ymax=x_r_int_for_0_95, fill="fill_g")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  guides(linetype=FALSE, shape=FALSE) +
  scale_colour_manual(name="Legend", labels=leg_col_labs, values=leg_col_cols, breaks=leg_col_breaks) +
  scale_fill_manual(name="", labels=leg_fill_labs, values=leg_fill_cols, breaks=leg_fill_breaks) +
  guides(colour=guide_legend(order=1,
                             override.aes=list(
                               linetype=c("solid", "solid", "solid", "solid", "solid", "blank", "blank", "blank"),
                               shape=c(NA,NA,NA,NA,NA,19,19,19)),
         fill=guide_legend(order=2))) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
plot(Gauss_rw_lp)

As discussed above, with slightly different notation, given an \(N\)-variate real time series \(\left(y_{t}\right)_{t\in T}\equiv\mathbf{y}\) our goal is to determine an \(N\)-variate real stochastic process \(\left(Y_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{Y}\) which is a good modelfor \(\mathbf{y}\). However, rather than seeking \(\mathbf{Y}\) directly, the general approach is to try to build \(\mathbf{Y}\) depending on other processes, that is, to seek an \(M\)-variate real stochastic process \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{X}\), an \(N\)-variate real stochastic process \(\left(N_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{N}\), and a Borel function \(F:\mathbb{R}_{+}\times\mathbb{R}^{M}\rightarrow\mathbb{R}^{N}\) such that we can write \[\begin{equation} \tag{3.2} Y_{t}=F(t,X_{t})+N_{t}, \end{equation}\] for every \(t\in\mathbb{T}\).

Definition 3.4 (Model representation) The process \(\mathbf{Y}\) [resp. \(\mathbf{X}\)] which satisfies Equation (3.2) is called the explained or regressand process [resp. explanatory or regressor process]. The process \(\mathbf{N}\) is called the noise process. The function \(F:\mathbb{R}_{+}\times\mathbb{R}^{M}\rightarrow\mathbb{R}^{N}\) is called the regression function.

Equation (3.2) is said to be a representation of the the process \(\mathbf{Y}\) in explanatory form. Nevertheless, it is often beneficial to assume that the entries of the explanatory process \(\mathbf{X}\) are lagged forms of the process \(\mathbf{Y}\) itself, for instance \[\begin{equation} \tag{3.3} X_{t}\equiv\Phi\left(Y_{t-1},Y_{t-2},\dots.Y_{t-\ell}\right) \end{equation}\] for some \(\ell\in\mathbb{N}\), some Borel function \(\Phi:\mathsf{X}_{n=1}^{\ell}\mathbb{R}^{N}\rightarrow\mathbb{R}^{N}\), and for every \(t\in\mathbb{T}\). In this case Equation (3.2) is more properly said to be a representation of the the process \(\mathbf{Y}\) in predictive form. It is also possible to have mixed explanatory-predictive forms in which the process \(\mathbf{X}\) takes the form \[\begin{equation} \tag{3.4} X_{t}\equiv\Phi\left(Y_{t-1},Y_{t-2},\dots.Y_{t-\ell},Z_{t}\right) \end{equation}\] for some \(\ell\in\mathbb{N}\), some Borel function \(\Phi:\left(\mathsf{X}_{n=1}^{\ell}\mathbb{R}^{N}\right)\times\mathbb{R}^{M}\rightarrow\mathbb{R}^{N}\), and for every \(t\in\mathbb{T}\), where \(\left(Z_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{Z}\) is an \(M\)-variate real stochastic process which is independent of, or at least uncorrelated with, the lagged values of the process \(\mathbf{Y}\).

As the standard approach, the noise process \(\mathbf{N}\) is assumed to have mean zero and to be independent of, or at least uncorrelated with, the process \(\mathbf{X}\).

Note that, Equation (3.2) is just a generalization of an ordinary regression equation between an explained random variable, an explanatory random variable, and a noise random variable, in which time plays a role.

In general, for a simpler inference of the structure of the regression function \(f:\mathbb{R}_{+}\times\mathbb{R}^{M}\rightarrow\mathbb{R}^{N}\), it is common practice to try to guess separate components.

  1. A mean component \(m:\mathbb{T}\rightarrow\mathbb{R}^{N}\equiv\mathbf{m}\), which is a deterministic function of time accounting for a possible regular dynamics shown by the graph \(\Gamma_{\mathbf{y}}\) of the time series \(\mathbf{y}\) not attributable to seasonal changes. In particular, we speak of a positive [resp. negative] trend component when \(\mathbf{m}\) needs to be an increasing [resp. decreasing] function; we speak of a cyclic component when to model \(\mathbf{y}\) the mean component \(\mathbf{m}\) needs to exhibit fluctuations, possibly with different widths, in occasional periods of the observation time (think of the business cycle).

  2. A seasonal component \(s:\mathbb{T}\rightarrow\mathbb{R}^{N}\equiv\mathbf{s}\), which is a another deterministic function of time accounting for possible fluctuations, with similar widths, shown by \(\Gamma_{\mathbf{y}}\) in regular periods of the observation time, typically hours, weeks, months, quarters, and years (think of the seasonal cycle). Writing \(\pi\) for the length of the regular fluctuation period, the standard prescription for the seasonal component \(\mathbf{s}\) is that \[\begin{equation} \sum_{t=1}^{\pi}s(t)=0\quad\text{and}\quad s(t)=s(t+\pi), \tag{3.5} \end{equation}\] for every \(t\in\mathbb{T}\) such that \(t+\pi\in\mathbb{T}\).

As a consequence, in several cases we try to represent the process \(\mathbf{Y}\) by an additive decomposition of the form \[\begin{equation} Y_{t}=m(t)+s(t)+g(X_{t})+N_{t}, \tag{3.6} \end{equation}\] for every \(t\in\mathbb{T}\), where \(g:\mathbb{R}^{M}\rightarrow\mathbb{R}^{N}\) is an appropriate function. In other cases, especially with reference to models for time series in financial markets, it might be more useful consider a multiplicative decomposition ot the form \[\begin{equation} Y_{t}=m(t)\times s(t)\times g(X_{t})\times N_{t}, \tag{3.7} \end{equation}\] for every \(t\in\mathbb{T}\).

However, we can always pass from a multiplicative to an additive [resp. from an additive to a multiplicative] decomposition by a logarithm [resp. exponential] transformation.

Loosely speaking, when dealing with a time series \(\mathbf{y}\), with reference to the additive decomposition (3.6), a rather standard procedure prescribes

  1. Explore the possibility to subject \(\mathbf{y}\) to an invertible non linear transformation, so called Box-Cox transformation, often the logarithm or the square root transformation, to remove simple form of heteroskedasticity, which shows as a pronounced variation of the spread of the points in the time series graph around the regression line. This leads to the homoskedastic transformed time series \(\left(\tilde{y}_{t}\right)_{t=1}^{T}\equiv\tilde{\mathbf{y}}\), such that \[\begin{equation} \tilde{y}_{t}=BC(y_{t}), \tag{3.8} \end{equation}\] for every \(t=1,\dots,T\), where \(BC:\mathbb{R}^{N}\rightarrow\mathbb{R}^{N}\) is the Box-Cox transformation considered.

  2. Try to remove a mean component in the transformed time series \(\tilde{\mathbf{y}}\) by time-regressions (e.g. a linear time-regression) or by smoothing procedures (e.g. a moving average). This leads to the demeaned homoskedastic transformed time series \(\left(\tilde{y}_{t}^{0}\right)_{t=1}^{T}\equiv\tilde{\mathbf{y}}^{0}\) such that \[\begin{equation} \tilde{y}_{t}^{0}=\tilde{y}_{t}-m(t), \tag{3.8} \end{equation}\] for every \(t=1,\dots,T\).

  3. Try to remove a seasonal component from the demeaned time series \(\tilde{\mathbf{y}}^{0}\) by spectral decomposition (e.g. a linear combination of sinusoids) or by a deseasonalizing procedure (e.g. a seasonal average). This leads to the deseasonalized demeaned homoskedastic transformed time series \(\left(\tilde{y}_{t}^{0,*}\right)_{t=1}^{T}\equiv\tilde{\mathbf{y}}^{0,*}\) such that \[\begin{equation} \tilde{y}_{t}^{0,*}=\tilde{y}_{t}^{0}-s(t), \tag{3.9} \end{equation}\] for every \(t=1,\dots,T\).

  4. Check whether the time series \(\tilde{\mathbf{y}}^{0,*}\) can be represented by the standard noise process, typically a white noise or an ARMA or a GARCH process, or we need to represent \(\tilde{\mathbf{y}}^{0,*}\) as the sample path of an explanatory or predictor stochastic process \(\mathbf{X}\). For instance it may happen that, after transforming, demeaning, and deseasonalizing, the time series \(\tilde{\mathbf{y}}^{0,*}\) still exhibits some stochastic trend and we need to remove it by differencing \(\tilde{\mathbf{y}}^{0,*}\) a number of times, until we obtain a detrended residual. This form of detrending is equivalent to represent \(\tilde{\mathbf{y}}^{0,*}\) in a predictor form. In this case, what remains is the noisy component of \(\tilde{\mathbf{y}}^{0,*}\) and we will try to represent it by a simple noise process.

Of course, there are countless cases which cannot completely dealt with this procedure or can be dealt with a simpler procedure. For instance in absence of a mean and seasonal component, we can try to model the series \(\tilde{\mathbf{y}}\) directly by a ARMA process. However, the above procedure is a rather useful modus operandi.

As a first example of application of the above procedure, we consider the analysis of the so called Monthly AU Red Wine Monthly Sales (MARWS) time series, which consists of the recording of the monthly sales (in kilolitres) of red wine in Australia, from Jan 1980 to Jul 1995. This time series is contained in the Time Series Data Library (tsdl) package (see https://pkg.yangzhuoranyang.com/tsdl/), created by Robin J. Hyndman.

By the following code chunk, we install the package tsdl and we load and show the content of the library tsdl. Hence, after locating the desired time series in the library, we create the basic objects containing the data of our interest.

# To install package tsdl execute in a (new) R script the two following code lines:
# install.packages("devtools")
# devtools::install_github("FinYang/tsdl")

# library(tsdl)                               
show(tsdl)
## Time Series Data Library: 648 time series  
## 
##                        Frequency
## Subject                 0.1 0.25   1   4   5   6  12  13  52 365 Total
##   Agriculture             0    0  37   0   0   0   3   0   0   0    40
##   Chemistry               0    0   8   0   0   0   0   0   0   0     8
##   Computing               0    0   6   0   0   0   0   0   0   0     6
##   Crime                   0    0   1   0   0   0   2   1   0   0     4
##   Demography              1    0   9   2   0   0   3   0   0   2    17
##   Ecology                 0    0  23   0   0   0   0   0   0   0    23
##   Finance                 0    0  23   5   0   0  20   0   2   1    51
##   Health                  0    0   8   0   0   0   6   0   1   0    15
##   Hydrology               0    0  42   0   0   0  78   1   0   6   127
##   Industry                0    0   9   0   0   0   2   0   1   0    12
##   Labour market           0    0   3   4   0   0  17   0   0   0    24
##   Macroeconomic           0    0  18  33   0   0   5   0   0   0    56
##   Meteorology             0    0  18   0   0   0  17   0   0  12    47
##   Microeconomic           0    0  27   1   0   0   7   0   1   0    36
##   Miscellaneous           0    0   4   0   1   1   3   0   1   0    10
##   Physics                 0    0  12   0   0   0   4   0   0   0    16
##   Production              0    0   4  14   0   0  28   1   1   0    48
##   Sales                   0    0  10   3   0   0  24   0   9   0    46
##   Sport                   0    1   1   0   0   0   0   0   0   0     2
##   Transport and tourism   0    0   1   1   0   0  12   0   0   0    14
##   Tree-rings              0    0  34   0   0   0   1   0   0   0    35
##   Utilities               0    0   2   1   0   0   8   0   0   0    11
##   Total                   1    1 300  64   1   1 240   3  16  21   648
# str(meta_tsdl)
tsdl_description <- meta_tsdl$description    # Storing library tsdl description in the tsdl_description list.
# show(tsdl_description)                     # Showing the content of tsdl_description list.
show(tsdl_description[155:160])              # Showing part of the content of tsdl_description list. 
## [[1]]
## [1] "Monthly number of unemployed persons in Australia. Feb 1978 – Aug 1995. Compare DOLE.DAT"
## 
## [[2]]
## [1] "Monthly Australian wine sales: thousands of litres. By wine makers in bottles <= 1 litre."
## 
## [[3]]
## [1] "Monthly production of woollen yarn in Australia: tonnes. Jan 1965 – Aug 1995."
## 
## [[4]]
## [1] "Quarterly production of woollen yarn in Australia: tonnes. Mar 1965 – Sep 1994."
## 
## [[5]]
## [1] "Tree: Rio Cisne, Chubut Ficu, Argentina. Alerce -4209-07133"
## 
## [[6]]
## [1] "Monthly sales of Tasty Cola"
                                             # Note that shown records are numbered so that 
                                             # the first shown record corresponds to the original 150th record.
head(tsdl[[156]])                            # Showing the 156th element of the library tsdl.
##          Fortified Drywhite Sweetwhite  Red Rose Sparkling Total
## Jan 1980      2585     1954         85  464  112      1686 15136
## Feb 1980      3368     2302         89  675  118      1591 16733
## Mar 1980      3210     3054        109  703  129      2304 20016
## Apr 1980      3111     2414         95  887   99      1712 17708
## May 1980      3756     2226         91 1139  116      1471 18019
## Jun 1980      4216     2725         95 1077  168      1377 19227
## attr(,"source")
## [1] Australian Bureau of Statistics
## attr(,"description")
## [1] Monthly Australian wine sales: thousands of litres. By wine makers in bottles <= 1 litre.
## attr(,"subject")
## [1] Sales
tail(tsdl[[156]])
##          Fortified Drywhite Sweetwhite  Red Rose Sparkling Total
## Feb 1995      1482     3819        230 1749   39      1402    NA
## Mar 1995      1818     4067        188 2459   45      1897    NA
## Apr 1995      2262     4022        195 2618   52      1862    NA
## May 1995      2612     3937        189 2585   28      1670    NA
## Jun 1995      2967     4365        220 3310   40      1688    NA
## Jul 1995      3179     4290        274 3923   62      2031    NA
## attr(,"source")
## [1] Australian Bureau of Statistics
## attr(,"description")
## [1] Monthly Australian wine sales: thousands of litres. By wine makers in bottles <= 1 litre.
## attr(,"subject")
## [1] Sales

We check the class of the object tsdl[[156]].

class(tsdl[[156]])
## [1] "mts"    "ts"     "matrix"

We check whether the multiple times series (mts) object tsdl[[156]] contains missed values.

sum(is.na(tsdl))    
## [1] 0

We extract the mts object MAWS_mts (Monthly Australian Wine Monthly Sales) from the library tsdl.

MAWS_mts <- tsdl[[156]]                       

We convert the mts object MAWS_mts to the data frame (df) object MAWS_df.

# Converting mts to df object.
MAWS_df <- data.frame(Year = trunc(time(MAWS_mts)), Month=month.abb[cycle(MAWS_mts)], MAWS_mts) 
head(MAWS_df)
##   Year Month Fortified Drywhite Sweetwhite  Red Rose Sparkling Total
## 1 1980   Jan      2585     1954         85  464  112      1686 15136
## 2 1980   Feb      3368     2302         89  675  118      1591 16733
## 3 1980   Mar      3210     3054        109  703  129      2304 20016
## 4 1980   Apr      3111     2414         95  887   99      1712 17708
## 5 1980   May      3756     2226         91 1139  116      1471 18019
## 6 1980   Jun      4216     2725         95 1077  168      1377 19227
tail(MAWS_df)
##     Year Month Fortified Drywhite Sweetwhite  Red Rose Sparkling Total
## 182 1995   Feb      1482     3819        230 1749   39      1402    NA
## 183 1995   Mar      1818     4067        188 2459   45      1897    NA
## 184 1995   Apr      2262     4022        195 2618   52      1862    NA
## 185 1995   May      2612     3937        189 2585   28      1670    NA
## 186 1995   Jun      2967     4365        220 3310   40      1688    NA
## 187 1995   Jul      3179     4290        274 3923   62      2031    NA

We add an index column to MAWS_df, which just replicates the names of the rows (row numbers) but is useful for plotting purposes, and, for sake of simplicity, we remove the columns we are not interested in, to create the data frame MARWS_df (Monthly AU Red Wine Monthly Sales). In the end, we rename the column Red to RWS. We are now in a position to carry on our analysis.

# library(tibble)
MAWS_df <- add_column(MAWS_df, t=c(1:nrow(MAWS_df)), .before="Year")
MARWS_df <- subset(MAWS_df, select=-c(Fortified, Drywhite, Sweetwhite, Rose, Sparkling, Total))
MARWS_df <-  rename(MARWS_df, RWS=Red)
head(MARWS_df)
##   t Year Month  RWS
## 1 1 1980   Jan  464
## 2 2 1980   Feb  675
## 3 3 1980   Mar  703
## 4 4 1980   Apr  887
## 5 5 1980   May 1139
## 6 6 1980   Jun 1077
tail(MARWS_df)
##       t Year Month  RWS
## 182 182 1995   Feb 1749
## 183 183 1995   Mar 2459
## 184 184 1995   Apr 2618
## 185 185 1995   May 2585
## 186 186 1995   Jun 3310
## 187 187 1995   Jul 3923

First, we split the RWS time series into two parts: the training set or in-sample set, the initial part of the time series of about \(90\%\) of the entire time series, and the test set or validation set or out-of-sample set, the remaining final part the time series, and we draw a scatter plot.

# The scatter plot
Data_df <- MARWS_df
length <- nrow(Data_df)
TrnS_length <- floor(length*0.9)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Scatter Plot of AU Red Wine Monthly Sales In-Sample and Out-of-Sample Set from ", .(First_Date), " to ", .(Last_Date))))
subtitle_content <- bquote(paste("path length ", .(length), " sample points. Data by courtesy of R. Hyndmay et al"))
caption_content <- "Author: Roberto Monte"
x_name <- bquote("")
x_breaks_num <- 30
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((max(x_breaks)-x_breaks_up)>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- paste(Data_df$Month[x_breaks],Data_df$Year[x_breaks])
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth, x_breaks_up+J*x_binwidth)
y_name <- bquote("Red Wine Monthly Sales (kliters)")
y_breaks_num <- 10
y_max <- max(na.omit(Data_df$RWS))
y_min <- min(na.omit(Data_df$RWS))
y_binwidth <- round((y_max-y_min)/y_breaks_num, digits=3)
y_breaks_low <- floor((y_min/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((y_max/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0.5
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_k <- bquote("in-sample set")
col_b <- bquote("out-of-sample set")
col_g <- bquote("regression line")
col_r <- bquote("LOESS curve")
leg_labs   <- c(col_k, col_b, col_g, col_r)
leg_cols   <- c("col_k"="black", "col_b"="blue", "col_r"="red", "col_g"="green")
leg_breaks <- c("col_k", "col_b", "col_g", "col_r")
MARWS_sp <- ggplot(Data_df) +
  geom_smooth(data=subset(Data_df, Data_df$t <= t[TrnS_length]), alpha=1, size = 0.8, linetype="solid", 
              aes(x=t, y=RWS, color="col_g"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=FALSE) +
  geom_smooth(data=subset(Data_df, Data_df$t <= t[TrnS_length]), alpha=1, size = 0.8, linetype="dashed", 
              aes(x=t, y=RWS, color="col_r"), method = "loess", formula = y ~ x, se=FALSE) +
  geom_point(data=subset(Data_df, Data_df$t <= t[TrnS_length]), alpha=1, size=1.5, shape=19, 
             aes(x=t, y=RWS, color="col_k")) +
  geom_point(data=subset(Data_df, Data_df$t > t[TrnS_length]), alpha=1, size=1.5, shape=19, 
             aes(x=t, y=RWS, color="col_b")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks,
                      guide=guide_legend(override.aes=list(shape=c(19,19,NA,NA), 
                                                           linetype=c("blank", "blank", "solid", "dashed")))) +
  theme(plot.title=element_text(hjust=0.5),
        plot.subtitle=element_text(hjust= 0.5),
        axis.text.x = element_text(angle=-45, vjust=1, hjust=-0.3),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(MARWS_sp)

Second, the line plot.

title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Line Plot of AU Red Wine Monthly Sales In-Sample and Out-of-Sample Set from ", .(First_Date), " to ", .(Last_Date))))
MARWS_lp <- ggplot(Data_df) +
  geom_smooth(data=subset(Data_df, Data_df$t <= t[TrnS_length]), alpha=1, size = 0.8, linetype="solid", 
              aes(x=t, y=RWS, color="col_g"), method = "lm" , formula = y ~ x, se=FALSE, 
              fullrange=FALSE) +
  geom_smooth(data=subset(Data_df, Data_df$t <= t[TrnS_length]), alpha=1, size = 0.8, linetype="dashed", 
              aes(x=t, y=RWS, color="col_r"), method = "loess", formula = y ~ x, se=FALSE) +
  geom_line(data=subset(Data_df, Data_df$t <= t[TrnS_length]), alpha=1, size=0.8, linetype="solid", 
            aes(x=t, y=RWS, color="col_k", group=1)) +
  geom_line(data=subset(Data_df, Data_df$t >= t[TrnS_length]), alpha=1, size=0.8, linetype="solid", 
            aes(x=t, y=RWS, color="col_b", group=1)) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks,
                      guide=guide_legend(override.aes=list(linetype=c("solid", "solid", "solid", "dashed")))) +
  theme(plot.title=element_text(hjust=0.5),
        plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=-45, vjust=1, hjust=-0.3),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(MARWS_lp)

From the inspection of both the scatter and line plot, the training set of the RWS time series presents visual evidence for heteroskedasticity. In particular, the spread of the points of the graph around the regression line increases in time. In addition, it is possible to see an increasing trend and a rather clear seasonal component. A non-linear transformations of the RWS time series might be useful to remove heteroskedasticity.

However, assume for a while that we do not notice the increasing spread of the graph around the regression line, neither the seasonal component, and, in light of the information provided by the sole in-sample set of the RWS time series, we consider the possibility that a simple predictor linear model, more specifically a white noise with drift and linear trend, might be a good model for the time series. Formally, this leads to assume that the regression function \(f:\mathbb{R}_{+}\times\mathbb{R}^{M}\rightarrow\mathbb{R}^{N}\) in Equation (3.2) has the form \[\begin{equation} f(t,X_{t})\overset{\text{def}}{=}\alpha + \beta t, \quad\forall t\in\mathbb{T}. \tag{3.10} \end{equation}\] In addition, the noise process \(\mathbf{N}\) is assumed to be a strong white noise. We will give a detailed characterization of a strong white noise in what follows. For now it is enough to think of a strong white noise as a process whose paths are obtained by a sequence of independent sampling from the same distribution with zero mean (i.e. each path is the realization of a sequence of independent and identically distributed random variables with zero mean) as in Examples 1.3 and 1.9.

In light of Equation (3.10), we recall that to build the predictor linear model of the process \(\mathbf{Y}\), given the time series \(\left(y_{t}\right)_{t=1}^{T}\equiv\mathbf{y}\), realization of \(\mathbf{Y}\), we have to consider the set of equations \[\begin{equation} y_{t}=\alpha+\beta t+n_{t},\quad t=1,\dots,T, \tag{3.11} \end{equation}\] where \(\alpha\) and \(\beta\) are the parameters of the linear regression function to be determined and \(\left(n_{t}\right)_{t=1}^{T}\equiv\mathbf{n}\) is the unobservable realization of the noise process \(\mathbf{N}\).

The OLS estimate of the vector parameter \(\left(\alpha,\beta\right)\) is the vector \((\hat{\alpha},\hat{\beta})\) which satisfies the minimization problem \[\begin{equation} (\hat{\alpha},\hat{\beta})= \underset{\left(\alpha,\beta\right)\in\mathbb{R}^{2}}{\arg\min}\left\{SSE\left(\alpha,\beta\right)\right\} \tag{3.12} \end{equation}\] where the function \(SSE:\mathbb{R}^{2}\rightarrow\mathbb{R}_{+}\), referred to as the sum of squared errors, is given by \[\begin{equation} SSE\left(\alpha,\beta\right)\overset{\text{def}}{=} {\sum_{t=1}^{T}} \left(y_{t}-\left(\alpha + \beta t\right)\right)^{2}. \tag{3.13} \end{equation}\]

Be aware that the function SSE is also known as the residual sum of squares (RSS).

Writing \(\left(t\right)_{t=1}^{T}\equiv\mathbf{t}\) for the time index sequence, it is not difficult to prove that the ordinary least squares (OLS) estimates \(\hat{\alpha}\) and \(\hat{\beta}\) of the parameters \(\alpha\) and \(\beta\) are given by \[\begin{equation} \hat{\alpha}\equiv\bar{y}_{T}-\hat{\beta}\bar{t}_{T} \quad\text{and}\quad \hat{\beta}\equiv\frac{s_{T}\left(\mathbf{t},\mathbf{y}\right)}{s_{T}^{2}\left(\mathbf{t}\right)}, \tag{3.14} \end{equation}\] where \[\begin{equation} \bar{t}_{T}\equiv\frac{1}{T}{\sum_{t=1}^{T}}t, \qquad \bar{y}_{T}\equiv\frac{1}{T}{\sum_{t=1}^{T}}y_{t}, \tag{3.15} \end{equation}\] \[\begin{equation} s_{T}^{2}\left(\mathbf{t}\right)\equiv\frac{1}{T-1}\sum_{t=1}^{T}\left(t-\bar{t}_{T}\right)^{2}, \qquad s_{T}\left(\mathbf{t},\mathbf{y}\right) \equiv\frac{1}{T-1}\sum_{t=1}^{T}\left(t-\bar{t}_{T}\right)\left(y_{t}-\bar{y}_{T}\right). \tag{3.16} \end{equation}\] Recall that we have \[\begin{equation} \bar{t}_{T}=\frac{T+1}{2}\quad\text{and}\quad s_{T}^{2}\left(\mathbf{t}\right)=\frac{T\left(T+1\right)}{12}. \end{equation}\]

Definition 3.5 (OLS regression line) We call the regression line of the time series \(\mathbf{y}\) on the time index sequence \(\mathbf{t}\) the straight line with intercet \(\hat{\alpha}\) and slope \(\hat{\beta}\), that is the straight line represented by the equation \[\begin{equation} y=\hat{\alpha}+\hat{\beta} t, \tag{3.17} \end{equation}\] on varying of \(\left(t,y\right)\in\mathbb{R}^{2}\).

Definition 3.6 (OLS fitted values) We call the OLS estimated or OLS fitted of the time series \(\mathbf{y}\) the time series \(\mathbf{\hat{\mathbf{y}}}\equiv\left(\hat{y}_{t}\right)\) given by \[\begin{equation} \hat{y}_{t}\overset{\text{def}}{=}\hat{\alpha}+\hat{\beta} t,\quad\forall t=1,\dots,T. \tag{3.17} \end{equation}\]

Definition 3.7 (OLS residuals) We call the OLS residual of the time series \(\mathbf{y}\) the time series \(\mathbf{\hat{\mathbf{n}}}\equiv\left(\hat{n}_{t}\right)\) given by \[\begin{equation} \hat{n}_{t}\overset{\text{def}}{=}y_{t}-\hat{y}_{t},\quad\forall t=1,\dots,T. \tag{3.18} \end{equation}\]

Note that both the fitted value \(\hat{y}_{t}\) and the residual \(\hat{n}_{t}\), corresponding to the \(t\)th term \(y_{t}\) of the time series \(\mathbf{y}\), are observable for any \(t=1,\dots,T\). From a graphical point of view, the fitted value \(\hat{y}_{t}\) is the ordinate of the point on the regression line with abscissa \(t\) and the residual \(\hat{n}_{t}\) is the deviation of the point \(\left(t,y_{t}\right)\) of the graph of \(\mathbf{y}\) from the point \(\left(t,\hat{y}_{t}\right)\), that is the vertical deviation of \(\left(t,y_{t}\right)\) from the regression line, on varying of \(t=1,\dots,T\).

If the residuals are small in magnitude, then much of the variability in the time series \(\mathbf{y}\) can be explained in terms of the variability in the time index sequence \(\mathbf{t}\) via the linear relationship between the time series and the time index sequence.

Proposition 3.1 (OLS residuals properties) We have \[\begin{equation} {\sum_{t=1}^{T}}\hat{n}_{t}=0 \tag{3.19} \end{equation}\] and \[\begin{equation} {\sum_{t=1}^{T}}\hat{n}_{t}t=0. \tag{3.20} \end{equation}\]

Proposition 3.2 (OLS fitted values property) We have \[\begin{equation} \frac{1}{T}{\sum_{t=1}^{T}}\hat{y}_{t}=\bar{y}_{T}. \tag{3.21} \end{equation}\]

Definition 3.8 (Total sum of squares) We call the total sum of squares (TSS) or total variation in the time series \(\mathbf{y}\) the sum of the squared deviations of \(\mathbf{y}\) about the horizontal line of equation \(y=\bar{y}_{T}\), that is the positive number \[\begin{equation} TSS\overset{\text{def}}{=} {\sum_{t=1}^{T}}\left(y_{t}-\bar{y}_{T}\right)^{2}. \tag{3.22} \end{equation}\]

The total sum of squares, TSS, expresses the overall variability in the time series \(\mathbf{y}\).

Proposition 3.3 (Total sum of squares) We have \[\begin{equation} TSS={\sum_{t=1}^{T}}\left(y_{t}-\hat{y}_{t}\right)^{2} +{\sum_{t=1}^{T}}\left(\hat{y}_{t}-\bar{y}_{T}\right)^{2}. \tag{3.23} \end{equation}\]

Definition 3.9 (Explained sum of squares) We call the explained sum of squares (ESS) or explained variation, the positive number \[\begin{equation} ESS\overset{\text{def}}{=}{\sum_{t=1}^{T}}\left(\hat{y}_{t}-\bar{y}_{T}\right)^{2}. \tag{3.24} \end{equation}\]

The explained sum of squares, ESS, can be interpreted as the amout of the total variation in the time series \(\mathbf{y}\) which can be explained in terms of the variability in the time index sequence \(\mathbf{t}\) via the linear model. In fact, we have

\[\begin{equation} ESS=\hat{\beta}^{2}\left(T-1\right)s_{T}^{2}\left(\mathbf{t}\right). \tag{3.25} \end{equation}\]

Definition 3.10 (Residual sum of squares) We call the residual sum of squares (RSS) or unexplained variation the sum of the squared deviations about the regression line, that is the positive number \[\begin{equation} RSS\overset{\text{def}}{=}{\sum_{t=1}^{T}}\left(y_{t}-\hat{y}_{t}\right)^{2} \equiv{\sum_{t=1}^{T}}\hat{n}_{t}^{2}. \tag{3.26} \end{equation}\]

Be aware that RSS is also known as sum of squared residuals (SSR) or sum of squared estimate of errors (SSE).

The residual sum of squares, RSS, can be interpreted as the amount of the total variation in the time series \(\mathbf{y}\) which cannot be explained in terms of the variability in the time index sequence \(\mathbf{t}\) via the linear model. Note that the sum of squared deviations about the regression line is smaller than the sum of squared deviations about any other line. In fact, we have

\[\begin{equation} RSS= \underset{\left(\alpha,\beta\right)\in\mathbb{R}^{2}}{\min}\left\{SSE\left(\alpha,\beta\right)\right\}. \tag{3.27} \end{equation}\]

Summarizing, we have \[\begin{equation} TSS=ESS+RSS, \tag{3.28} \end{equation}\] equivalently \[\begin{equation} 1=\frac{ESS}{TSS}+\frac{RSS}{TSS}. \tag{3.29} \end{equation}\]

As a consequence of what presented above, the quantity ESS/TSS [resp. \(\left(100\cdot ESS/TSS\right)\%\)] represents the proportion [resp. percentage] of TSS which is explained by the linear regression, and the quantity RSS/TSS [resp. \(\left(100\cdot SSR/TSS\right)\%\)] represents the proportion [resp. percentage] of TSS which cannot be explained by the linear regression.

Definition 3.11 (Residual standard error) We call the residual standard error (RSE) the positive number \[\begin{equation} \hat{\sigma}_\mathbf{N}\overset{\text{def}}{=}\frac{1}{\sqrt{T-c}}RSS^{1/2} \equiv\left(\frac{1}{T-c}\sum_{t=1}^{T}\hat{n}_{t}^{2}\right)^{1/2}. \tag{3.30} \end{equation}\] where \(T\) is the length of the time series \(\mathbf{y}\), \(c\) is the number of the coefficients of the linear regression (we clearly have \(c=2\) for a simple linear regression), and \(T-c\equiv df\) is referred to as the degrees of freedom of the residuals of the linear model, that is the number of the observations minus the number of the estimated parameters in the linear model.

The residual standard error is an estimate of the standard deviation of the error process \(\mathbf{N}\) accounting for the reduction in the degrees of freedom caused by the estimated parameters of the linear model.

Definition 3.12 (Coefficient of determination) We call the coefficient of determination the positive number \[\begin{equation} R^{2}\overset{\text{def}}{=}\frac{ESS}{TSS}. \tag{3.31} \end{equation}\]

Remark (Coefficient of determination). We have \[\begin{equation} R^{2}=\frac{{\sum_{t=1}^{T}}\left(\hat{y}_{t}-\bar{y}_{T}\right)^{2}} {{\sum_{t=1}^{T}}\left(y_{t}-\bar{y}_{T}\right)^{2}}, \tag{3.32} \end{equation}\] and \[\begin{equation} 1-R^{2}=\frac{RSS}{TSS} =\frac{{\sum_{t=1}^{T}}\left(y_{t}-\hat{y}_{t}\right)^{2}} {{\sum_{t=1}^{T}}\left(y_{t}-\bar{y}_{T}\right)^{2}}. \tag{3.33} \end{equation}\]

The coefficient of determination \(R^{2}\) provides a measure of how well the model is fitting the actual data. More specifically, the statistic \(R^2\) is a measure of the linear relationship between the independent and the dependent variable. The coefficient of determination \(R^2\) always lies between 0 and 1. The closer \(R^2\) to \(1\), the better the variation in the time series \(\mathbf{y}\) can be explained in terms of the linear dependence on the time index sequence \(\mathbf{t}\) expressed by the regression line. Note that in multiple regression models, the statistic \(R^2\) always increase as more variables are considered. Therefore, the Adjusted R-squared, is the preferred measure of the variability of the dependent variable in terms of the dependent variables, as \(\tilde{R}^2\) accounts for the number of variables considered.

Definition 3.13 (Adjusted Coefficient of determination) We call the adjusted coefficient of determination the positive number \[\begin{equation} \tilde{R}^{2}\overset{\text{def}}{=}1-\frac{RSS/\left(T-c\right)}{TSS/\left(T-1\right)}, \tag{3.34} \end{equation}\] where \(T-c\equiv df\) is the number of the degrees of freedom of the linear model.

Since for a simple linear regression we have \(c=2\), clearly \(\tilde{R}^{2}\approx R^{2}\). The adjusted coefficient of determination \(\tilde{R}^{2}\) turns out to be a statistic more useful than \(R^{2}\) when dealing with multiple linear regressions.

We have \[\begin{equation} \tilde{R}^{2}=1-\frac{T-1}{T-c}\left(1-R^{2}\right) =1-\frac{T-1}{T-c}\frac{{\sum_{k=1}^{T}}\left(y_{t}-\hat{y}_{t}\right)^{2}} {{\sum_{t=1}^{T}}\left(y_{t}-\bar{y}_{T}\right)^{2}}. \tag{3.35} \end{equation}\]

Definition 3.14 (Goodness of fit coefficient) We call the goodness of fit coefficient the positive number \[\begin{equation} F\overset{\text{def}}{=}\frac{T-c}{c-1}\frac{ESS}{RSS}, \tag{3.36} \end{equation}\] where \(T-c\equiv df\) is the number of the degrees of freedom of the linear model.

We have \[\begin{equation} F=\frac{T-c}{c-1}\frac{ESS/TSS}{RSS/TSS}=\cfrac{\cfrac{R^{2}}{c-1}}{\cfrac{1-R^{2}}{T-c}}. \tag{3.37} \end{equation}\]

The goodness of fit coefficient turns out to be the realization of a Fisher-Snedecor statistic with degrees of freedom \(df_{1}^{F}=c-1\) and \(df_{2}^{F}=T-c\). Assumuing that the error process is a Gaussian white noise, the Fisher-Snedecor statistics allows an overall evaluation of the linear model by hypothesis test. The higher the F-statistic than one, the better the linear model fits the time series. On the contrary, the lower the F-statistic than one, the worse the linear model fits the time series. Recall that differently than the Student t-statistic, the Fisher-Snedecor statistic is rather sensitive to a possible non-Gaussianity of the error process.

Computationally, we build the linear model for the training set of the time series MARWS from the data frame MARWS_df by means of the R function lm().

MARWS_lm <- lm(RWS[1:TrnS_length]~t[1:TrnS_length], data=MARWS_df)
class(MARWS_lm)
## [1] "lm"

The function lm() returns the object MARWS_lm, which is a list containing all essential items produced by the linear regression of the training set of the time series MARWS on its time index. For instance, we can retrieve the fitted values stored in the list as MARWS_lm[[“fitted.values”]] or MARWS_lm$fitted.values, we can retrieve the residuals as MARWS_lm[[“residuals”]] or MARWS_lm$residuals, and the degrees of freedom as MARWS_lm[[“df.residual”]] or MARWS_lm$df.residual.

The fitted values.

MARWS_fit <- MARWS_lm[["fitted.values"]]
head(MARWS_fit)
##        1        2        3        4        5        6 
## 848.9408 857.6355 866.3303 875.0250 883.7198 892.4145
tail(MARWS_fit)
##      163      164      165      166      167      168 
## 2257.490 2266.185 2274.880 2283.575 2292.269 2300.964

The residuals.

MARWS_res <- MARWS_lm[["residuals"]]
head(MARWS_res)
##          1          2          3          4          5          6 
## -384.94076 -182.63551 -163.33026   11.97499  255.28024  184.58549
tail(MARWS_res)
##        163        164        165        166        167        168 
##  799.50975 1063.81500 -378.87975 -187.57450   81.73075  234.03600

The degrees of freedom.

MARWS_degfr <- MARWS_lm[["df.residual"]]
show(MARWS_degfr)
## [1] 166

Note that the operator $ does not always allows to retrieve the desired elements from a list. In our opinion, the best way to learn the correct syntax to retrieve elements from a list is to open the list as a file in the source panel of R-studio, scroll through the list until we find the desired item, and click on the small icon that appears to the far right of the desired item when crossed with the mouse pointer. This will display in the console panel the correct syntax to retrieve the element from the list.

Other information on the linear model MARWS_lm, which is not on the list, can be obtained by using suitable R functions. For instance the R function nobs() yields the numbers of observations

MARWS_nobs <- nobs(MARWS_lm)
show(MARWS_nobs)
## [1] 168

and the R function sigma() yields the residual standard error.

MARWS_RSE <- sigma(MARWS_lm)
show(MARWS_RSE)
## [1] 395.1764

Note that Equation (3.30) holds true.

MARWS_RSE==sqrt((1/MARWS_degfr)*sum(MARWS_res^2))
## [1] TRUE

Actually, considering \(18\) decimal digits,

show(c(sprintf(MARWS_RSE, fmt="%.18f"), sprintf(sqrt((1/MARWS_degfr)*sum(MARWS_res^2)), fmt="%.18f")))
## [1] "395.176406455728965739" "395.176406455728965739"

Note also that the standard deviation of the residuals is slightly different of the residual standard error. In fact,

sd(MARWS_res)==sqrt((1/(MARWS_nobs-1))*sum(MARWS_res^2))
## [1] FALSE

Actually, considering \(18\) decimal digits,

show(c(sprintf(sd(MARWS_res), fmt="%.18f"), sprintf(sqrt((1/(MARWS_nobs-1))*sum(MARWS_res^2)), fmt="%.18f")))
## [1] "393.991467037282461661" "393.991467037282518504"

Pretty detailed information can be obtained by the R function summary().

MARWS_lm_summ <- summary(MARWS_lm)
show(MARWS_lm_summ)
## 
## Call:
## lm(formula = RWS[1:TrnS_length] ~ t[1:TrnS_length], data = MARWS_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1314.32  -193.09    11.72   236.53  1171.19 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      840.2460    61.2503   13.72   <2e-16 ***
## t[1:TrnS_length]   8.6947     0.6287   13.83   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 395.2 on 166 degrees of freedom
## Multiple R-squared:  0.5354, Adjusted R-squared:  0.5326 
## F-statistic: 191.3 on 1 and 166 DF,  p-value: < 2.2e-16

The Residuals section of the summary output presents five summary points. These points express the symmetry of the residuals about the mean value zero. Recall that the mean value zero characterizes the residuals of any linear regressions (see Equation (3.19)).

The stronger the symmetry of the summary point about zero the better the linear model fits the data. A lack of symmetry means that some fitted values \(\hat{y}_{t}\) falls far away from the corresponding observed values \(y_{t}\). More specifically, the closer the summary points are to the corresponding summary points of the zero centered Gaussian distribution with standard deviation given by the residual standard error, the better the model fits the data.

In this case, the five summary points of Residuals section show some lack of symmetry and do not fit much the corresponding summary points of the zero centered Gaussian distribution with standard deviation equal to the residual standard error. In fact, considering that

show(c(round(qnorm(0.25, mean = 0, sd = MARWS_RSE, lower.tail = TRUE),2), round(qnorm(0.75, mean = 0, sd = MARWS_RSE, lower.tail = TRUE),2)))
## [1] -266.54  266.54
show(c(round(-3*MARWS_RSE,2),round(3*MARWS_RSE,2)))
## [1] -1185.53  1185.53

for a such a Gaussian distribution we have the following theoretical values (to be compared to the residual summary points in the last row): \[\begin{equation} \begin{array}{ccccc} Min\,(99.73\%) & 1Q & Median & 3Q & Max\,(99.73\%)\\ -1302.19 & -292.77 & 0.00 & 292.77 & 1302.19\\ -1335.83 & -204.90 & 3.37 & 233.49 & 1428.18 \end{array} \end{equation}\]

Recall that for any Gaussian distribution with mean \(\mu\) and standard deviation \(\sigma\) we have \[\begin{equation} Min\,(99.73\%)= \mu-3\sigma, \qquad Max\,(99.73\%)= \mu+3\sigma, \end{equation}\] and \[\begin{equation} \Phi_{\mu,\sigma}(1Q) = 0.25, \qquad \Phi_{\mu,\sigma}(3Q) = 0.75, \end{equation}\] where \(\Phi_{\mu,\sigma} :\mathbb{R}\rightarrow \mathbb{R}\) is the distribution or cumulant function of the Gaussian density with mean \(\mu\) and standard deviation \(\sigma\), given by \[\begin{equation*} \Phi_{\mu,\sigma}\left(x\right)\overset{\text{def}}{=}\int_{-\infty}^{x}e^{-\frac{\left(u-\mu\right)^{2}}{\sigma^{2}}}du, \quad x\in \mathbb{R}. \end{equation*}\]

We also recall that, given the first and third quartiles \(1Q\) and \(3Q\), respectively, the mean \(\mu\) and standard deviation \(\sigma\) of the corresponding Gaussian distribution satisty the equations \[\begin{equation} 1Q = \mu - 0.67448\,\sigma \quad\text{and}\quad 3Q = \mu + 0.67448\,\sigma. \tag{3.38} \end{equation}\] The real numbers \(-0.67448\) and \(0.67448\) being the first and third quartiles of the standard Gaussian distribution, respectively, From Equation (3.38) it follows \[\begin{equation} \mu=(1Q+3Q)/2 \quad\text{and}\quad \sigma=(3Q-1Q)/2*0.67448. \tag{3.39} \end{equation}\] Therefore, retrieving the estimates for \(1Q\) and \(3Q\) from the summary of MARWS_lm, by means of

MARWS_lm_res_summ <- summary(MARWS_lm$residuals)
show(MARWS_lm_res_summ)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -1314.32  -193.09    11.72     0.00   236.53  1171.19

We obtain the quartile-estimates for \(\mu\) and \(\sigma\), respectively given by

quart_mean <- as.vector(MARWS_lm_res_summ[2]+MARWS_lm_res_summ[5])/2
show(quart_mean)
## [1] 21.71993

and

quart_sd <- as.vector(MARWS_lm_res_summ[5]-MARWS_lm_res_summ[2])/(2*0.67448)
show(quart_sd)
## [1] 318.4769

The following code chunk check the correctness of our computations.

Q1 <- round(qnorm(0.25, mean = quart_mean, sd = quart_sd, lower.tail = TRUE), digits=14)
Q3 <- round(qnorm(0.75, mean = quart_mean, sd = quart_sd, lower.tail = TRUE), digits=14)
show(c(Q1, Q3))
## [1] -193.0895  236.5293
show(c(MARWS_lm_res_summ[2],MARWS_lm_res_summ[5]))
##   1st Qu.   3rd Qu. 
## -193.0864  236.5262

As a consequence, if the residuals of the linear model MARWS_lm were generated by independent sampling from a Gaussian distribution, then the quartile-estimates of the mean and standard deviation of such a Gaussian distribution would take approximate values \(21.7199\) and \(318.4769\). Seemly, rather far from \(0\) and \(395.1764\), respectively. To answer the question “how far?”, we consider the confidence intervals of the mean and the standard deviation of the residuals under the assumption that they are generated by independent sampling from a Gaussian distribution. To this we exploit the R functions t_test() of the library stat and varTest() of the library EnvStats.

Res_t_test <- t.test(x=MARWS_res, alternative = "two.sided", mu=0, conf.level=0.95)
show(c(sprintf(Res_t_test$estimate, fmt="%.4f"), sprintf(Res_t_test$conf.int, fmt="%.4f")))
## [1] "-0.0000"  "-60.0121" "60.0121"
Res_chisq_test <- EnvStats::varTest(x=MARWS_res, alternative="two.sided", sigma.squared=MARWS_lm_summ$sigma^2, conf.level=0.95)
show(c(sprintf(Res_chisq_test$estimate, fmt="%.4f"), sprintf(Res_chisq_test$conf.int, fmt="%.4f")))
## [1] "155229.2761" "126656.0070" "194750.0487"

Under the assumption of Gaussian distributed residuals, the quartile-estimate of the mean \(21.7199\) of the residuals is in the \(95\%\) confidence interval \([-60.0121, 60.0121]\) of theoretical value \(0\), but the quartile-estimate of the variance, given by the square of the quartile estimate of the standard deviation, that is \(318.4769^2=101427.5358\), is not in the \(95\%\) confidence interval \([126656.0070, 194750.0487]\) of the square of the standard error \(395.1764^2=156164.3922\). This is a computational evidence against the Gaussianity of the distribution which generates the residuals of the linear model.

For completeness, we also compute the skewness, and the kurtosis jointly with the \(95\%\) confidence intervals derived under the assumption of Gaussian distributed residuals.

Skew_Gauss <- DescTools::Skew(x=MARWS_res, weights = NULL, na.rm = TRUE, method = 2, conf.level = 0.95, ci.type = "classic")
show(Skew_Gauss)
##   skewness     lwr.ci     upr.ci 
## -0.1420802 -0.3671480  0.3671480
Kurt_Gauss <- DescTools::Kurt(x=MARWS_res, weights = NULL, na.rm = TRUE, method = 2, conf.level = 0.95, ci.type = "classic")
show(Kurt_Gauss)
##   kurtosis     lwr.ci     upr.ci 
##  0.9224319 -0.7301426  0.7301426

Using the option ci.type = “classic”, the confidence intervals for the skewness and excess kurtosis are computed under the assumption of Gaussian distributed residuals. The estimated skewness is in the \(95\%\) confidence interval. But the estimated excess kurtosis is not. This strengthen the computational evidence against the Gaussianity assumption.

Thus, the residuals of the linear model MARWS_lm appear to be unskewed but leptokurtic at the \(95\%\) confidence level.

Summarizing, a basic analysis of the residuals of the linear model MARWS_lm provides evidences to reject the assumption that the residuals are generated by independent sampling from the zero centered Gaussian distribution with standard deviation given by the residual standard error.

Going back to the summary of the linear model MARWS_lm, we observe that the Coefficients section of the output presents two rows of 4 summary points.

MARWS_lm_res_coeff <- MARWS_lm_summ[["coefficients"]]
show(MARWS_lm_res_coeff)
##                   Estimate Std. Error  t value     Pr(>|t|)
## (Intercept)      840.24601 61.2502802 13.71824 4.061156e-29
## t[1:TrnS_length]   8.69475  0.6286739 13.83030 1.968840e-29

These points are related to the estimates of the linear regression coefficients.

The first [resp. second] Coefficients Estimate is the ordinary least square (OLS) estimated value of the intercept [resp. slope] of the regression line.

The first [resp. second] Coefficients standard error, Std. Error, measures the average amount by which the estimated intercept [resp. slope] regression parameter may vary from the true value, that is the the standard deviation of the unbiased intercept [resp. slope] estimator. The smaller the Coefficient Standard Error (relative to the corresponding Coefficient Estimate), the better the fit of the model.

The Coefficients t-value, that is the value of the Student t-statistic under the null hypothesis that the corresponding coefficient true value is \(0\), is a measure of how many standard deviations the corresponding Coefficients estimate is far away from 0. The higher the t-value, the stronger the rejection of the null hypothesis that the corresponding coefficient true value is \(0\).

The Coefficients p-value, \(Pr(>|t|)\), that is the p-value of the Student t-statistic under the null hypothesis that the corresponding coefficient true value is \(0\), is the probability of obtaining any value equal or larger than the t-statistic value, if the null hypothesis were true. The smaller the coefficient p-value, the stronger the rejection of the null hypothesis that the corresponding coefficient true value is \(0\).

However, we have to stress that the reliability of the Coefficients standard error, t-value, and p-value depend on the hypothesis that the error process in the linear model is a Gaussian white noise, that is a sequence of independent and identically distributed gaussian random variables. Under this assumption it turns out that also the residuals of the linear model are stationary, not autocorrelated, homoskedastic, and Gaussian distributed. Still under the Gaussian white noise assumption for the error process, the estimators for the intercept and slope of the linear regression turn out to be the best linear unbiased estimators (BLUE). Therefore, to assess the adequacy of the linear model, a deeper analysis of the residuals is in order. However, recall that the failure of the sole hypothesis of Gaussian distributed residuals, the other hypotheses being verified, does not significantly alter the results of the t-tests provided we have a large number of observations.

Putting aside for a while the analysis of the residuals of the linear model MARWS_lm we proceed with examination of the other elements of the summary.

The Residual Standard Error (RSE), that is the estimate of standard deviation of the error process \(\mathbf{N}\), measures the overall quality of the linear regression fit. In a linear regression model, the best single error statistic to consider is RSE. The lower RSE the better is the linear regression fit. Actually, the estimated coefficients of the linear regression are obtained minimizing the sum of squared errors \(SSE\) and we have \[\begin{equation} RSE=\frac{1}{T-c}RSS^{1/2} =\frac{1}{T-c}\underset{\left(\alpha,\beta\right)\in\mathbb{R}^{2}}{\min}\left\{SSE\left(\alpha,\beta\right)\right\}. \tag{3.40} \end{equation}\] (see Equations (3.27) and (3.30)).

In our case, we have already shown the validity of Equation (3.40) when presenting the R function sigma().

The Multiple R-squared, or coefficient of determination, is given by Equation (3.31), where the total sum of squares TSS and the explained sum of squares ESS are given by Equations (3.22) and (3.24), respectively. In our case we have

MARWS_lm_TSS <- sum((MARWS_lm[["model"]][["RWS[1:TrnS_length]"]] - mean(MARWS_lm[["model"]][["RWS[1:TrnS_length]"]]))^2)
show(MARWS_lm_TSS)
## [1] 55793990

and

MARWS_lm_ESS <- sum((MARWS_lm[["fitted.values"]] - mean(MARWS_lm[["model"]][["RWS[1:TrnS_length]"]]))^2)
show(MARWS_lm_ESS)
## [1] 29870701

On the other hand,

MARWS_lm_Rsq <- MARWS_lm_summ[["r.squared"]]
show(MARWS_lm_summ[["r.squared"]])
## [1] 0.5353749

and we have

MARWS_lm_Rsq==MARWS_lm_ESS/MARWS_lm_TSS
## [1] TRUE

Actually,

show(c(sprintf(MARWS_lm_Rsq, fmt="%.18f"), sprintf(MARWS_lm_ESS/MARWS_lm_TSS, fmt="%.18f")))
## [1] "0.535374880244309681" "0.535374880244309681"

We recall that the statistic \(R^2\) is the portion/percentage of the variability of the explained variable which can be explained, via the linear model, in terms of the explanatory variable. In contrast, the statistic \(1-R^2=RSS/TSS\) is the portion/percentage of the variability of the explained variable which cannot be explained, via the linear model, in terms of the explanatory variable.

With regard to the adjusted \(R^2\), in our case we have

MARWS_lm_adj_Rsq <- MARWS_lm_summ[["adj.r.squared"]]
show(MARWS_lm_summ[["adj.r.squared"]])
## [1] 0.5325759

and we have (see Equation (3.35))

MARWS_lm_adj_Rsq==1-(nobs(MARWS_lm)-1)/(MARWS_degfr)*(1-MARWS_lm_Rsq)
## [1] TRUE

Actually,

show(c(sprintf(MARWS_lm_adj_Rsq, fmt="%.18f"), sprintf(1-(nobs(MARWS_lm)-1)/(MARWS_degfr)*(1-MARWS_lm_Rsq), fmt="%.18f")))
## [1] "0.532575933739757312" "0.532575933739757312"

To complete the examination of the summary, we consider the F-statistic. As already mentioned, the F-statistic is a good indicator of whether there is a linear relationship between the explained and the explanatory variable. In our case, we have

MARWS_lm_F_stat <- MARWS_lm_summ[["fstatistic"]]
show(MARWS_lm_F_stat)
##    value    numdf    dendf 
## 191.2773   1.0000 166.0000

and we have (see Equation @ef(eq:goodness-fit-coefficient-rem-eq))

MARWS_lm_F_stat[["value"]]==(MARWS_degfr/(nobs(MARWS_lm)-1-(MARWS_degfr)))*((MARWS_lm_Rsq)/(1-MARWS_lm_Rsq))
## [1] TRUE

Actually,

show(c(sprintf(MARWS_lm_F_stat[["value"]], fmt="%.18f"), sprintf((MARWS_degfr/(nobs(MARWS_lm)-1-(MARWS_degfr)))*((MARWS_lm_Rsq)/(1-MARWS_lm_Rsq)), fmt="%.18f")))
## [1] "191.277282139386500148" "191.277282139386500148"

Note that the small p-value of the F-statistic

MARWS_lm_F_p_val <- pf(MARWS_lm_F_stat[["value"]], df1=MARWS_lm_F_stat[["numdf"]], df2=MARWS_lm_F_stat[["dendf"]], lower.tail = FALSE)
show(MARWS_lm_F_p_val)
## [1] 1.96884e-29

implies the rejection of the null hypothesis that the linear model MARWS_lm does not assess a significant relationship between the explanatory variables and the explained variable.

Going back to the residuals of the linear model MARWS_lm, a visual inspection is possible by means of the following code chunk, which produces a draft scatter plot of residuals vs fitted values.

plot(MARWS_lm,1)

The Residuals vs Fitted scatter plot shows whether residuals have a non-linear pattern. There might be a non-linear relationship between the explained and the explanatory variables and the pattern should show up in this plot. Equally spread residuals around a horizontal line without distinct patterns is a good indication of absence of a non-linear relationships between the explained and the explanatory variable. A red LOESS line very close to the horizontal line is another visual evidence for the absence of non-linear relationship between the explained and the explanatory variables.

In this case, from the Residuals vs Fitted scatter plot, we have a visual evidence for stationarity in mean of the residuals so that the linear model seems to be able to explain the trend. On the other hand, the visual evidence for heteroskedasticity is confirmed: the spread of the residuals around the LOESS line increases. In terms of seasonality, we think it is more informative to consider the following Residuals vs Indices scatter plot.

We add the fitted values and the residuals of the linear model for the training set of Monthly AU Red Wine Monthly Sales to the MARWS_df data frame.

MARWS_lm_fitted <- c(as.vector(MARWS_lm[["fitted.values"]]), rep(NA, (length-TrnS_length)))
MARWS_lm_residuals <- c(as.vector(MARWS_lm[["residuals"]]), rep(NA, (length-TrnS_length)))
MARWS_df <- add_column(MARWS_df, MARWS_lm_fit=MARWS_lm_fitted, MARWS_lm_res=MARWS_lm_residuals, .after="RWS")
head(MARWS_df)
##   t Year Month  RWS MARWS_lm_fit MARWS_lm_res
## 1 1 1980   Jan  464     848.9408   -384.94076
## 2 2 1980   Feb  675     857.6355   -182.63551
## 3 3 1980   Mar  703     866.3303   -163.33026
## 4 4 1980   Apr  887     875.0250     11.97499
## 5 5 1980   May 1139     883.7198    255.28024
## 6 6 1980   Jun 1077     892.4145    184.58549
tail(MARWS_df,20)
##       t Year Month  RWS MARWS_lm_fit MARWS_lm_res
## 168 168 1993   Dec 2535     2300.964      234.036
## 169 169 1994   Jan 1041           NA           NA
## 170 170 1994   Feb 1728           NA           NA
## 171 171 1994   Mar 2201           NA           NA
## 172 172 1994   Apr 2455           NA           NA
## 173 173 1994   May 2204           NA           NA
## 174 174 1994   Jun 2660           NA           NA
## 175 175 1994   Jul 3670           NA           NA
## 176 176 1994   Aug 2665           NA           NA
## 177 177 1994   Sep 2639           NA           NA
## 178 178 1994   Oct 2226           NA           NA
## 179 179 1994   Nov 2586           NA           NA
## 180 180 1994   Dec 2684           NA           NA
## 181 181 1995   Jan 1185           NA           NA
## 182 182 1995   Feb 1749           NA           NA
## 183 183 1995   Mar 2459           NA           NA
## 184 184 1995   Apr 2618           NA           NA
## 185 185 1995   May 2585           NA           NA
## 186 186 1995   Jun 3310           NA           NA
## 187 187 1995   Jul 3923           NA           NA

To simplify the following code chunks we shorten the data frame MARWS_df to the sole in-sample set.

MARWS_train_df <- MARWS_df[1:TrnS_length,]
head(MARWS_train_df)
##   t Year Month  RWS MARWS_lm_fit MARWS_lm_res
## 1 1 1980   Jan  464     848.9408   -384.94076
## 2 2 1980   Feb  675     857.6355   -182.63551
## 3 3 1980   Mar  703     866.3303   -163.33026
## 4 4 1980   Apr  887     875.0250     11.97499
## 5 5 1980   May 1139     883.7198    255.28024
## 6 6 1980   Jun 1077     892.4145    184.58549
tail(MARWS_train_df)
##       t Year Month  RWS MARWS_lm_fit MARWS_lm_res
## 163 163 1993   Jul 3057     2257.490    799.50975
## 164 164 1993   Aug 3330     2266.185   1063.81500
## 165 165 1993   Sep 1896     2274.880   -378.87975
## 166 166 1993   Oct 2096     2283.575   -187.57450
## 167 167 1993   Nov 2374     2292.269     81.73075
## 168 168 1993   Dec 2535     2300.964    234.03600

Hence, we draw the scatter plot.

Data_df  <- MARWS_train_df
length <- nrow(Data_df)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Scatter Plot of Residuals vs Indices of the Linear Model for the training Set of Monthly AU Red Wine Monthly Sales In-Sample Setfrom ", .(First_Date), " to ", .(Last_Date))))
subtitle_content <- bquote(paste("path length ", .(length), " sample points. Data by courtesy of R. Hyndmay et al"))
caption_content <- "Author: Roberto Monte"
x_name <- bquote("")
y_name <- bquote("sales (kliters)")
x_breaks_num <- 30
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((max(x_breaks)-x_breaks_up)>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- paste(Data_df$Month[x_breaks],Data_df$Year[x_breaks])
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth, x_breaks_up+J*x_binwidth)
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$MARWS_lm_res)-min(Data_df$MARWS_lm_res))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$MARWS_lm_res)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$MARWS_lm_res)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0.5
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("Residuals vs Index")
col_2 <- bquote("LOESS curve")
col_3 <- bquote("regression line")
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="blue", "col_2"="red", "col_3"="green")
leg_breaks <- c("col_1", "col_2", "col_3")
MARWS_Res_sp <- ggplot(Data_df) +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=MARWS_lm_res, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=MARWS_lm_res, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_point(alpha=1, size=1.0, shape=19, aes(x=t, y=MARWS_lm_res, color="col_1")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks,
                      guide=guide_legend(override.aes=list(shape=c(19,NA,NA), 
                                                           linetype=c("blank", "dashed", "solid")))) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=-45, vjust=1, hjust=-0.3),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(MARWS_Res_sp)

The line plot

Data_df  <- MARWS_train_df
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Line Plot of Residuals vs Indices of the Linear Model for the training Set of Monthly AU Red Wine Monthly Sales In-Sample Setfrom ", .(First_Date), " to ", .(Last_Date))))
MARWS_Res_lp <- ggplot(Data_df) +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=MARWS_lm_res, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=MARWS_lm_res, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_line(alpha=1, size=0.8, linetype="solid", aes(x=t, y=MARWS_lm_res, color="col_1")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks,
                      guide=guide_legend(override.aes=list(linetype=c("solid", "solid", "dashed")))) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=-45, vjust=1, hjust=-0.3),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(MARWS_Res_lp)

As with regard to the inspection of the Residuals vs Fitted scatter plot, also in this case a LOESS line almost flat is a strong visual evidence of stationarity in mean for the residuals. Nevertheless, to test computationally the stationarity of residuals one usually uses a couple of tests: the Augmented Dickey-Fuller (ADF) test and the Kwiatowski-Phillips-Schmidt-Shin (KPSS) test. The ADF test assumes the null hypothesis of non-stationarity. More specifically, the ADF test assumes that the time series is generated by a stochastic process with a random walk component. This null hypothesis leads to refer to the ADF test as a unit root test. On the contrary, the KPSS test assumes the null hypothesis of stationarity. Eventually, the KPSS test assumes that the time series is generated by an auto-regressive process. When the ADF test rejects the null and KPSS does not, we have evidence for stationarity in the time series. When the ADF test does not reject the null and KPSS does, we have evidence for non-stationarity. Other cases are considered doubtful.

Both the ADF and KPSS test are somewhat complex. In this case, due the strong visual evidence for stationarity in mean we just apply them in the simplest form. Note that the ADF in the simplest form is referred to as the Dickey-Fuller (DF) test.

The DF test.

# library(urca)                 # The library for this veRsion of the test.
 y <- MARWS_lm[["residuals"]]   # The data set to be tested.
num_lags <- 0                   # Setting the lag parameter for the test.

MARWS_lm_res_DF_none <- ur.df(y, type="none", lags=num_lags, selectlags="Fixed")    
# Applying the form of the DF test which considers the null hypothesis that the data set is generated by 
# a process with a random walk component, while the alternative hypothesis is that the data set is generated 
# by an autoregressive process with no drift and trend.

summary(MARWS_lm_res_DF_none)   # Showing the result of the test
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1535.46  -118.03    15.38   196.98   999.32 
## 
## Coefficients:
##         Estimate Std. Error t value Pr(>|t|)    
## z.lag.1 -0.58068    0.07031  -8.259 4.41e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 357.6 on 166 degrees of freedom
## Multiple R-squared:  0.2912, Adjusted R-squared:  0.287 
## F-statistic: 68.21 on 1 and 166 DF,  p-value: 4.407e-14
## 
## 
## Value of test-statistic is: -8.2592 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

The test statistics of the DF test takes value inside the rejection region at the significance level \(\alpha=0.01\) or \(\alpha=1\%\). Therefore, we can reject the unit root null hypothesis in favor of the mean stationary alternative.

We apply the KPSS test.

# library(urca)                 # The library for this vesion of the test
y <- MARWS_lm[["residuals"]]    # The data set to be tested

MARWS_lm_res_KPSS_mu <- ur.kpss(y, type="mu", lags="nil", use.lag=NULL)    
# Applying the simplest form of the KPSS test which considers the hull hypothesis that
# the data set is generated by an autoregressive process with constant mean,
# while the alternative hypothesis is that the data set is generated a process with a random walk component.

summary(MARWS_lm_res_KPSS_mu)    # Showing the result of the test
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 0 lags. 
## 
## Value of test-statistic is: 0.0708 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

the test statistics of the KPSS test takes value outside the rejection region at the significance level \(\alpha=0.1\) or \(\alpha=10\%\). Therefore, we cannot reject the mean stationary null hypothesis in favor of the unit root alternative.

Note that, with the goal of rejecting the null hypothesis the lowest is the value of the significance level \(\alpha\) the strongest is the rejection. On the contrary, with the goal of not rejecting the null hypothesis, the higher is the value of the significance level \(\alpha\) (among the standard ones) the strongest is the not rejection.

In light of

  1. the visual evidences from the scatter plots of Residuals vs Fitted and Residuals vs Indices;

  2. the rejection of the unit root null hypothesis of the DF test in favor of the mean stationary alternative;

  3. the lack of rejection of the mean stationary null hypothesis of the KPSS test in favor of the unit root alternative;

we have collected significant evidence to assess that the residuals of the MARWS_lm linear model have been generated by a process with stationary (zero) mean.

Referring to the likely residual heteroskedasticity, another visual inspection is possible by drawing the so-called Scale-Location Plot, which is the draft scatter plot of the square root of the absolute values of residuals vs fitted values.

plot(MARWS_lm,3)

Equally spread points around a horizontal line without distinct patterns is visual evidence for homoskedasticity in the residuals, that is absence of heteroskedasticity in the error process. A red LOESS line very close to the horizontal line is another visual evidence for absence of heteroskedasticity in the error process. In this case, the Scale-Location plot shows a pattern evidencing heteroskedasticity.

The same plot can be drawn with some additional detail trying to enhance the slope in the regression and the LOESS lines.

We add the square root of the absolute residuals to the MARWS_df data frame.

length <- nrow(MARWS_df)
MARWS_lm_sqrt_abs_residuals <- c(as.vector(sqrt(abs(MARWS_lm[["residuals"]]))), rep(NA, (length-TrnS_length)))
MARWS_df <- add_column(MARWS_df, MARWS_lm_sqrt_abs_res=MARWS_lm_sqrt_abs_residuals, .after="MARWS_lm_res")
head(MARWS_df)
##   t Year Month  RWS MARWS_lm_fit MARWS_lm_res MARWS_lm_sqrt_abs_res
## 1 1 1980   Jan  464     848.9408   -384.94076              19.61991
## 2 2 1980   Feb  675     857.6355   -182.63551              13.51427
## 3 3 1980   Mar  703     866.3303   -163.33026              12.78007
## 4 4 1980   Apr  887     875.0250     11.97499               3.46049
## 5 5 1980   May 1139     883.7198    255.28024              15.97749
## 6 6 1980   Jun 1077     892.4145    184.58549              13.58622
tail(MARWS_df,20)
##       t Year Month  RWS MARWS_lm_fit MARWS_lm_res MARWS_lm_sqrt_abs_res
## 168 168 1993   Dec 2535     2300.964      234.036              15.29824
## 169 169 1994   Jan 1041           NA           NA                    NA
## 170 170 1994   Feb 1728           NA           NA                    NA
## 171 171 1994   Mar 2201           NA           NA                    NA
## 172 172 1994   Apr 2455           NA           NA                    NA
## 173 173 1994   May 2204           NA           NA                    NA
## 174 174 1994   Jun 2660           NA           NA                    NA
## 175 175 1994   Jul 3670           NA           NA                    NA
## 176 176 1994   Aug 2665           NA           NA                    NA
## 177 177 1994   Sep 2639           NA           NA                    NA
## 178 178 1994   Oct 2226           NA           NA                    NA
## 179 179 1994   Nov 2586           NA           NA                    NA
## 180 180 1994   Dec 2684           NA           NA                    NA
## 181 181 1995   Jan 1185           NA           NA                    NA
## 182 182 1995   Feb 1749           NA           NA                    NA
## 183 183 1995   Mar 2459           NA           NA                    NA
## 184 184 1995   Apr 2618           NA           NA                    NA
## 185 185 1995   May 2585           NA           NA                    NA
## 186 186 1995   Jun 3310           NA           NA                    NA
## 187 187 1995   Jul 3923           NA           NA                    NA

Again we shorten the data frame MARWS_df to the sole in-sample set

MARWS_train_df <- MARWS_df[1:TrnS_length,]
head(MARWS_train_df)
##   t Year Month  RWS MARWS_lm_fit MARWS_lm_res MARWS_lm_sqrt_abs_res
## 1 1 1980   Jan  464     848.9408   -384.94076              19.61991
## 2 2 1980   Feb  675     857.6355   -182.63551              13.51427
## 3 3 1980   Mar  703     866.3303   -163.33026              12.78007
## 4 4 1980   Apr  887     875.0250     11.97499               3.46049
## 5 5 1980   May 1139     883.7198    255.28024              15.97749
## 6 6 1980   Jun 1077     892.4145    184.58549              13.58622
tail(MARWS_train_df)
##       t Year Month  RWS MARWS_lm_fit MARWS_lm_res MARWS_lm_sqrt_abs_res
## 163 163 1993   Jul 3057     2257.490    799.50975             28.275603
## 164 164 1993   Aug 3330     2266.185   1063.81500             32.616177
## 165 165 1993   Sep 1896     2274.880   -378.87975             19.464834
## 166 166 1993   Oct 2096     2283.575   -187.57450             13.695784
## 167 167 1993   Nov 2374     2292.269     81.73075              9.040506
## 168 168 1993   Dec 2535     2300.964    234.03600             15.298235

We draw again the scatter plot of Absolute Residuals Square Roots vs Fitted Values of the MARWS_lm model.

# The scatter plot
Data_df  <- MARWS_train_df
length <- nrow(Data_df)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Scale Location Plot of Residuals in the Linear Model for Monthly AU Red Wine Monthly Sales In-Sample Setfrom ", .(First_Date), " to ", .(Last_Date))))
subtitle_content <- bquote(paste("path length ", .(length), " sample points. Data by courtesy of R. Hyndmay et al"))
caption_content <- "Author: Roberto Monte"
x_name <- bquote("fitted values")
x_breaks_num <- 10
x_breaks_low <- min(Data_df$MARWS_lm_fit)
x_breaks_up <-  max(Data_df$MARWS_lm_fit)
x_binwidth <- round((x_breaks_up-x_breaks_low)/x_breaks_num, digits=1)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((max(x_breaks)-x_breaks_up)>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth, x_breaks_up+J*x_binwidth)
y_name <- bquote("square roots of absolute residuals")
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$MARWS_lm_sqrt_abs_res)-min(Data_df$MARWS_lm_sqrt_abs_res))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$MARWS_lm_sqrt_abs_res)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$MARWS_lm_sqrt_abs_res)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 1.0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("square roots of absolute residuals vs fitted")
col_2 <- bquote("LOESS curve")
col_3 <- bquote("regression line")
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="blue", "col_2"="red", "col_3"="green")
leg_breaks <- c("col_1", "col_2", "col_3")
MARWS_Sqrt_Abs_Res_sp <- ggplot(Data_df) +
  geom_hline(yintercept = mean(Data_df$MARWS_lm_sqrt_abs_res), size=0.3, colour="black") +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=MARWS_lm_fit, y=MARWS_lm_sqrt_abs_res, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=MARWS_lm_fit, y=MARWS_lm_sqrt_abs_res, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_point(alpha=1, size=1.0, shape=19, aes(x=MARWS_lm_fit, y=MARWS_lm_sqrt_abs_res, color="col_1")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks,
                      guide=guide_legend(override.aes=list(shape=c(19,NA,NA), 
                                                           linetype=c("blank", "dashed", "solid")))) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=0, vjust=1),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(MARWS_Sqrt_Abs_Res_sp)

We draw also the line plot of the Absolute Residual Square Root vs Indices in the MARWS_lm model to better highlight the seasonality.

Data_df <- MARWS_train_df
length <- nrow(Data_df)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Line Plot of Square Root of Absolute Residuals of the Linear Model for Monthly AU Red Wine Monthly Sales In-Sample Setfrom ", .(First_Date), " to ", .(Last_Date))))
subtitle_content <- bquote(paste("path length ", .(length), " sample points. Data by courtesy of R. Hyndmay et al"))
caption_content <- "Author: Roberto Monte"
x_name <- bquote("")
x_breaks_num <- 30
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((max(x_breaks)-x_breaks_up)>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- paste(Data_df$Month[x_breaks],Data_df$Year[x_breaks])
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth, x_breaks_up+J*x_binwidth)
y_name <- bquote("square roots of absolute residuals")
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$MARWS_lm_sqrt_abs_res)-min(Data_df$MARWS_lm_sqrt_abs_res))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$MARWS_lm_sqrt_abs_res)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$MARWS_lm_sqrt_abs_res)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 1.0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("square roots of absolute residuals vs indices")
col_2 <- bquote("LOESS curve")
col_3 <- bquote("regression line")
leg_labs   <- c(col_1, col_2, col_3)
leg_cols   <- c("col_1"="blue", "col_2"="red", "col_3"="green")
leg_breaks <- c("col_1", "col_2", "col_3")
MARWS_Sqrt_Abs_Res_lp <- ggplot(Data_df) +
  geom_hline(yintercept = mean(Data_df$MARWS_lm_sqrt_abs_res), size=0.3, colour="black") +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=MARWS_lm_sqrt_abs_res, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=MARWS_lm_sqrt_abs_res, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_line(alpha=1, size=0.8, linetype="solid", aes(x=t, y=MARWS_lm_sqrt_abs_res, color="col_1")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks,
                      guide=guide_legend(override.aes=list(linetype=c("solid", "dashed", "solid")))) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=-45, vjust=1, hjust=-0.3),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(MARWS_Sqrt_Abs_Res_lp)

The computational tests which are usually applied to detect heteroskedasticity in time serie are the Breusch-Pagan (BP) and the White (W) test. The BP and W tests assume that the terms \(N_t\) in the error process \(\mathbf{N}\) are independent and identically Gaussian distributed (in particular, homoskedastic). The BP and W tests check whether the variance of the error terms is dependent on the values of the explanatory variables of the BP and W auxiliary linear regression. More specifically, the auxiliary linear regression of the BP test fits the squared residuals of the original linear regression to the fitted values; the auxiliary linear regression of the W test fits the squared residuals of the original linear regression to the regressors plus their squared terms and interactions. The tests reject the null if too much of the variance is explained by the explanatory variables of the BP and W linear regression model.

Summarizing we have: null hypothesis - equal/constant variances in the error terms; alternative hypothesis - unequal/non-constant variances in the error terms.

The BP and W tests are \(\chi^{2}\) tests. The higher the \(\chi^{2}\) value, equivalently the lower the p-value (Prob > Chi2), the more unlikely the error terms are homoskedastic.

The option studentize is important when dealing with residuals with heavy tailed distribution. The option varformula allows the introduction of the White test

Immediately executable releases of the BP and W tests are contained in the library lmtest and library olsrr.

By the following code chunks, we show the application of these tests to our case. First, we build a convenient data frame and linear model.

Data_df  <- data.frame(x=MARWS_train_df$t, y=MARWS_train_df$RWS, y_fit=MARWS_train_df$MARWS_lm_fit, y_res=MARWS_train_df$MARWS_lm_res)
Data_lm <- lm(y~x, data=Data_df)                              # The original linear model

Second, we show the application of the tests

# Unstudentized Breusch-Pagan test
lmtest::bptest(formula = y~x, varformula = NULL, studentize = FALSE, data=Data_df)
## 
##  Breusch-Pagan test
## 
## data:  y ~ x
## BP = 13.843, df = 1, p-value = 0.0001987
# More briefly
lmtest::bptest(Data_lm, studentize = FALSE)
## 
##  Breusch-Pagan test
## 
## data:  Data_lm
## BP = 13.843, df = 1, p-value = 0.0001987
# Alternatively (olsrr - breusch pagan test)
olsrr::ols_test_breusch_pagan(Data_lm, fitted.values = TRUE, rhs = FALSE)
## 
##  Breusch Pagan Test for Heteroskedasticity
##  -----------------------------------------
##  Ho: the variance is constant            
##  Ha: the variance is not constant        
## 
##             Data              
##  -----------------------------
##  Response : y 
##  Variables: fitted values of y 
## 
##          Test Summary           
##  -------------------------------
##  DF            =    1 
##  Chi2          =    13.84332 
##  Prob > Chi2   =    0.0001987016

The studentized Breusch-Pagan test was proposed by R. Koenker in his 1981 article: “A Note on Studentizing a Test for Heteroscedasticity”. The most obvious difference of the two is that they use different test statistics. The studentized test statistic differs from the non-studentized statistic by a factor, which is proportional to the reciprocal of the kurtosis of the residuals of the linear model for the Breusch-Pagan test. A quote from R. Koenker’s paper may be helpful: the asymptotic power of the Breusch and Pagan test is extremely sensitive to the kurtosis of the distribution of the residuals of the linear model for the test, and the asymptotic size of the test is correct only in special case of Gaussian kurtosis. The studentized modification of the Breusch-Pagan test correct the test statistic in the sense that leads to asymptotically correct significance levels for a reasonably large class of distributions of the residuals of the linear model for the test.

# Studentized Breusch-Pagan test
lmtest::bptest(formula = y~x, varformula = NULL, studentize = TRUE, data=Data_df)
## 
##  studentized Breusch-Pagan test
## 
## data:  y ~ x
## BP = 9.6816, df = 1, p-value = 0.001861
# Alternatively
lmtest::bptest(Data_lm, studentize = TRUE)
## 
##  studentized Breusch-Pagan test
## 
## data:  Data_lm
## BP = 9.6816, df = 1, p-value = 0.001861
# Equivalently (olsrr - score test)
olsrr::ols_test_score(Data_lm, fitted_values = TRUE, rhs = FALSE)
## 
##  Score Test for Heteroskedasticity
##  ---------------------------------
##  Ho: Variance is homogenous
##  Ha: Variance is not homogenous
## 
##  Variables: fitted values of y 
## 
##          Test Summary          
##  ------------------------------
##  DF            =    1 
##  Chi2          =    9.681649 
##  Prob > Chi2   =    0.001861175
# We show also the explicit construction of the test. 
# 1. by the squared residuals and fitted values of the original linear model we build the BP auxiliary linear model.
BP_lm <- lm(y_res^2~y_fit, data=Data_df)
# summary(BP_lm)
# From the BP linear model model we extract the determination coefficient R^2.
BP_lm_summary <- summary(BP_lm)
R_2 <- BP_lm_summary$r.squared
# From the determination coefficient R^2 we compute the $\chi^{2}$ statistic with one degree of freedom.
BP_Chi_2_stat <- nobs(Data_lm)*R_2
# From the $\chi^{2}$ statistic we compute the p-value.
BP_p_Chi_2 <- pchisq(BP_Chi_2_stat, 1, ncp = 0, lower.tail = FALSE, log.p = FALSE)
# We show statistic and p-value.
BP_Chi_2_test <- c(BP_Chi_2_stat, BP_p_Chi_2)
show(BP_Chi_2_test)
## [1] 9.681649095 0.001861175

Another Version of the BP test

# (olsrr - F Test)
olsrr::ols_test_f(Data_lm, fitted_values = TRUE, rhs = FALSE)
## 
##  F Test for Heteroskedasticity
##  -----------------------------
##  Ho: Variance is homogenous
##  Ha: Variance is not homogenous
## 
##  Variables: fitted values of y 
## 
##        Test Summary         
##  ---------------------------
##  Num DF     =    1 
##  Den DF     =    166 
##  F          =    10.15141 
##  Prob > F   =    0.001722844
# Equivalently
BP_lm <- lm(y_res^2~y_fit, data=Data_df)
# summary(BP_lm)
# From the BP linear model model we extract the determination coefficient R^2.
BP_lm_summary <- summary(BP_lm)
R_2 <- BP_lm_summary$r.squared
# From the determination coefficient R^2 we compute the F statistic.
BP_F_stat <- (R_2/1)/((1-R_2)/(nobs(Data_lm)-2))
# From the F statistic we compute the p-value.
BP_p_F_stat <- pf(BP_F_stat, df1 = 1,  df2 = (nobs(Data_lm)-2), ncp = 0, lower.tail = FALSE, log.p = FALSE)
# We show statistic and p-value.
BP_F_test <- c(BP_F_stat,BP_p_F_stat)
show(BP_F_test)
## [1] 10.151405321  0.001722844

In light of the results of the two variants of the BP test, \(\chi^{2}\)-variant and \(F\)-variant, we have to reject the homoskedasticity null hypothesis at the \(1\%\) significance level.

However, we also consider the W Test. To this, we build an auxiliary linear model by regressing the squared residuals against the independent variables and squared independent variables of the original model.

# library(lmtest)
var.formula <- ~ x+I(x^2)
# The operator I() inhibits the interpretation of operators such as "+", "-", "*" and "^" 
# as arithmetical operators, so that they are used as formula operators.
lmtest::bptest(formula = y ~ x, varformula = var.formula, studentize = TRUE, data=Data_df)
## 
##  studentized Breusch-Pagan test
## 
## data:  y ~ x
## BP = 9.8387, df = 2, p-value = 0.007304
# Equivalently
# Building the White linear model.
W_lm_x <- lm(y_res^2~x+I(x^2), data=Data_df)
# We consider the summary of the linear model W_lm_x
Summ_W_lm_x <- summary(W_lm_x)
# From the summary of W_lm_x we extract the determination coefficient R^2.
R_2 <- Summ_W_lm_x$r.squared
# From the adjusted determination coefficient R^2 we compute the $\chi^{2}$ statistic with two degrees of freedom.
W_lm_x_Chi_2_stat <- nobs(W_lm_x)*R_2
# From the $\chi^{2}$ statistic we compute the p-value.
W_lm_x_p_Chi_2 <- pchisq(W_lm_x_Chi_2_stat, 2, ncp = 0, lower.tail = FALSE, log.p = FALSE)
# We show statistic and p-value.
W_lm_x_Chi_2_test <- c(W_lm_x_Chi_2_stat,W_lm_x_p_Chi_2)
show(W_lm_x_Chi_2_test)
## [1] 9.838652859 0.007304049

The White test rejects the null hypothesis of homosckedasticity at the 1% significance level.

Here is another form of the White test applied to the linear model built by regressing the squared residual against the fitted values and squared fitted values of the original linear model.

# library(lmtest)
var.formula <- ~ y_fit+I(y_fit^2)
# Recall that the operator I() inhibits the interpretation of operators such as "+", "-", "*" and "^" 
# as arithmetical operators, so that they are used as formula operators.
lmtest::bptest(formula = y ~ x, varformula = var.formula, studentize = TRUE, data=Data_df)
## 
##  studentized Breusch-Pagan test
## 
## data:  y ~ x
## BP = 9.8387, df = 2, p-value = 0.007304
W_lm_y <- lm(y_res^2~y_fit+I(y_fit^2), data=Data_df)
# We consider the summary of the linear model W_lm_y
Summ_W_lm_y <- summary(W_lm_y)
# From the summary of W_lm_y we extract the determination coefficient R^2.
R_2 <- Summ_W_lm_y$r.squared
# From the adjusted determination coefficient R^2 we compute the $\chi^{2}$ statistic with two degrees of freedom.
W_lm_y_Chi_2_stat <- nobs(W_lm_y)*R_2
# From the $\chi^{2}$ statistic we compute the p-value.
W_lm_y_p_Chi_2 <- pchisq(W_lm_y_Chi_2_stat, 2, ncp = 0, lower.tail = FALSE, log.p = FALSE)
# We show statistic and p-value.
W_lm_y_Chi_2_test <- c(W_lm_y_Chi_2_stat, W_lm_y_p_Chi_2)
show(W_lm_y_Chi_2_test)
## [1] 9.838652859 0.007304049

In light of the visual inspections, and the results of the various versions of the BP and W tests, we have significant evidence to reject the null hypothesis that the residuals of the MARWS linear model are generated by homoskedastic noise.

The clear rejection of the null hypothesis of homoskedasticity in the noise process \(\mathbf{N}\) generating the residuals of the MARWS linear model might lead to reject the linear model as a good model for the MARWS time series. However, before dealing with the problem of the removal of heteroskedasticity, it may be interesting to show how the visual evidence of seasonality reflects on the correlogram of the time series.

Plot of the autocorrelogram.

Data_df <-  MARWS_train_df
y <-  Data_df$MARWS_lm_res
T <- length(y)
# maxlag <- ceiling(10*log10(T))      # Default
# maxlag <- ceiling(sqrt(n)+45)       # Box-Jenkins
# maxlag <- ceiling(min(10, T/4))     # Hyndman (for data without seasonality)
maxlag <- ceiling(min(2*12, T/5))     # Hyndman https://robjhyndman.com/hyndsight/ljung-box-test/
Aut_Fun_y_acf <- acf(y, lag.max=maxlag, type="correlation", plot=FALSE)
ci_90 <- qnorm((1+0.90)/2)/sqrt(T)
ci_95 <- qnorm((1+0.95)/2)/sqrt(T)
ci_99 <- qnorm((1+0.99)/2)/sqrt(T)
Aut_Fun_y_df <- data.frame(lag=Aut_Fun_y_acf$lag, acf=Aut_Fun_y_acf$acf)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Plot of the Autocorrelogram of the Residuals in the Linear Model for Monthly AU Red Wine Monthly Sales In-Sample Setfrom ", .(First_Date), " to ", .(Last_Date))))
subtitle_content <- bquote(paste("path length ", .(length), " sample points. Data by courtesy of R. Hyndmay et al"))
caption_content <- "Author: Roberto Monte"
x_name <- bquote("lags")
x_breaks_num <- maxlag
x_binwidth <- 1
x_breaks <- Aut_Fun_y_acf$lag
x_labs <- format(x_breaks, scientific=FALSE)
y_name <- bquote("acf value")
Aut_Corr_Res_bp <- ggplot(Aut_Fun_y_df, aes(x=lag, y=acf)) + 
  geom_segment(aes(x=lag, y=rep(0,length(lag)), xend=lag, yend=acf), size=1, col="black") +
  geom_hline(aes(yintercept=-ci_90, color="CI_90"), show.legend=TRUE, lwd=0.9, lty=3) +
  geom_hline(aes(yintercept=ci_90, color="CI_90"), lwd=0.9, lty=3) +
  geom_hline(aes(yintercept=ci_95, color="CI_95"), show.legend=TRUE, lwd=0.8, lty=2) + 
  geom_hline(aes(yintercept=-ci_95, color="CI_95"), lwd=0.8, lty=2) +
  geom_hline(aes(yintercept=-ci_99, color="CI_99"), show.legend=TRUE, lwd=0.8, lty=4) +
  geom_hline(aes(yintercept=ci_99, color="CI_99"), lwd=0.8, lty=4) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs) +
  scale_y_continuous(name="acf value", breaks=waiver(), labels=NULL,
                     sec.axis=sec_axis(~., breaks=waiver(), labels=waiver())) +
  scale_colour_manual(name="Conf. Inter.", labels=c("90%","95%","99%"), values=c(CI_90="green", CI_95="blue", CI_99="red"),
                      guide=guide_legend(override.aes=list(linetype=c("dotted", "dashed", "dotdash")))) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust=0.5), 
        plot.subtitle=element_text(hjust= 0.5),
        plot.caption=element_text(hjust=1.0),
        legend.key.width=unit(0.8,"cm"), legend.position="bottom")
plot(Aut_Corr_Res_bp)

The number of spikes corresponding to positive lags crossing the confidence lines is beyond the wide statistical tolerance. In fact, we have twelve spikes crossing all \(99\%\), \(95\%\), and \(90\%\) confidence lines. Recall that the tolerance is ceiling(maxlag \(\ast 0.01\))=ceiling(\(24 \ast 0.01\))=\(1\), ceiling(maxlag \(\ast 0.05\))=ceiling(\(24 \ast 0.05\))=\(2\), and ceiling(maxlag \(\ast 0.10\))=ceiling(\(24 \ast 0.10\))=\(3\), respectively. Therefore, we have clearly visual evidence for autocorrelation.

The presence of seasonality reflects on the periodicity of the peaks crossing the confidence lines.

We also draw the partial autocorrelogram.

Data_df <-  MARWS_train_df
y <-  Data_df$MARWS_lm_res
T <- length(y)
# maxlag <- ceiling(10*log10(T))      # Default
# maxlag <- ceiling(sqrt(n)+45)       # Box-Jenkins
# maxlag <- ceiling(min(10, T/4))     # Hyndman (for data without seasonality)
maxlag <- ceiling(min(2*12, T/5))     # Hyndman https://robjhyndman.com/hyndsight/ljung-box-test/
Part_Aut_Fun_y_pacf <- pacf(y, lag.max=maxlag, type="correlation", plot=FALSE)
ci_90 <- qnorm((1+0.90)/2)/sqrt(T)
ci_95 <- qnorm((1+0.95)/2)/sqrt(T)
ci_99 <- qnorm((1+0.99)/2)/sqrt(T)
Part_Aut_Fun_y_df <- data.frame(lag=Part_Aut_Fun_y_pacf$lag, pacf=Part_Aut_Fun_y_pacf$acf)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Plot of the Partial Autocorrelogram of the Residuals in the Linear Model for Monthly AU Red Wine Monthly Sales In-Sample Setfrom ", .(First_Date), " to ", .(Last_Date))))
subtitle_content <- bquote(paste("path length ", .(length), " sample points. Data by courtesy of R. Hyndmay et al"))
caption_content <- "Author: Roberto Monte"
x_name <- bquote("lags")
x_breaks_num <- maxlag
x_binwidth <- 1
x_breaks <- Part_Aut_Fun_y_pacf$lag
x_labs <- format(x_breaks, scientific=FALSE)
y_name <- bquote("pacf value")
Part_Aut_Corr_Res_bp <- ggplot(Part_Aut_Fun_y_df, aes(x=lag, y=pacf)) + 
  geom_segment(aes(x=lag, y=rep(0,length(lag)), xend=lag, yend=pacf), size=1, col="black") +
  geom_hline(aes(yintercept=-ci_90, color="CI_90"), show.legend=TRUE, lwd=0.9, lty=3) +
  geom_hline(aes(yintercept=ci_90, color="CI_90"), lwd=0.9, lty=3) +
  geom_hline(aes(yintercept=ci_95, color="CI_95"), show.legend=TRUE, lwd=0.8, lty=2) + 
  geom_hline(aes(yintercept=-ci_95, color="CI_95"), lwd=0.8, lty=2) +
  geom_hline(aes(yintercept=-ci_99, color="CI_99"), show.legend=TRUE, lwd=0.8, lty=4) +
  geom_hline(aes(yintercept=ci_99, color="CI_99"), lwd=0.8, lty=4) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs) +
  scale_y_continuous(name="acf value", breaks=waiver(), labels=NULL,
                     sec.axis=sec_axis(~., breaks=waiver(), labels=waiver())) +
  scale_colour_manual(name="Conf. Inter.", labels=c("90%","95%","99%"), values=c(CI_90="green", CI_95="blue", CI_99="red"),
                      guide=guide_legend(override.aes=list(linetype=c("dotted", "dashed", "dotdash")))) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust=0.5), 
        plot.subtitle=element_text(hjust= 0.5),
        plot.caption=element_text(hjust=1.0),
        legend.key.width=unit(0.8,"cm"), legend.position="bottom")
plot(Part_Aut_Corr_Res_bp)

The autocorrelation in the residuals of the linear model MARWS_lm can be computationally confirmed by the Ljung-Box (LB) test.

Data_df <-  MARWS_train_df
y <- Data_df$MARWS_lm_res
T <- length(y)
maxlag <- ceiling(min(2*12, T/5))     # Hyndman https://robjhyndman.com/hyndsight/ljung-box-test/
y_LB <- LjungBoxTest(y, k=1, lag.max=maxlag, StartLag=1, SquaredQ=FALSE)
show(y_LB)
##   m     Qm       pvalue
##   1  29.94 4.449775e-08
##   2  31.40 2.096880e-08
##   3  31.50 1.443253e-07
##   4  33.08 3.092302e-07
##   5  52.29 1.197821e-10
##   6 118.26 0.000000e+00
##   7 140.75 0.000000e+00
##   8 141.33 0.000000e+00
##   9 141.59 0.000000e+00
##  10 143.19 0.000000e+00
##  11 161.74 0.000000e+00
##  12 257.35 0.000000e+00
##  13 281.69 0.000000e+00
##  14 283.33 0.000000e+00
##  15 283.51 0.000000e+00
##  16 285.69 0.000000e+00
##  17 307.47 0.000000e+00
##  18 372.43 0.000000e+00
##  19 397.22 0.000000e+00
##  20 400.89 0.000000e+00
##  21 400.93 0.000000e+00
##  22 401.97 0.000000e+00
##  23 420.22 0.000000e+00
##  24 503.88 0.000000e+00
plot(y_LB[,3], main="Ljung-Box Q Test", ylab="P-values", xlab="Lag")

The null hypothesis that the residuals have been generated by independent and identically distributed noise is rejected at the lowest significance level.

The autocorrelation in the residuals is an even stronger reason than heteroskedasticity for rejecting the linear model.

In the end, we have to reject the linear model as a good model for the time series MARWS. Therefore, for sake of simplicity, we remove the data characterizing the linear model from MARWS_df.

MARWS_train_df <-  subset(MARWS_train_df, select=-c(MARWS_lm_fit, MARWS_lm_res, MARWS_lm_sqrt_abs_res))
head(MARWS_train_df)
##   t Year Month  RWS
## 1 1 1980   Jan  464
## 2 2 1980   Feb  675
## 3 3 1980   Mar  703
## 4 4 1980   Apr  887
## 5 5 1980   May 1139
## 6 6 1980   Jun 1077
tail(MARWS_train_df)
##       t Year Month  RWS
## 163 163 1993   Jul 3057
## 164 164 1993   Aug 3330
## 165 165 1993   Sep 1896
## 166 166 1993   Oct 2096
## 167 167 1993   Nov 2374
## 168 168 1993   Dec 2535

As mentioned above, the application of a non-linear Box-Cox transformation to the time series can be for help to remove heteroskedasticity. Typically, one applies a logarithm or a square root transformation, but an optimal choice of the transformation can be performed by a computational procedure

Let \(\left(y_{t}\right)_{t=0}^{T}\equiv\mathbf{y}\) be a time series of length \(T\), for any \(T\in\mathbb{N}\).

Definition 3.15 (Box-Cox transformation) We call the Box-Cox transformed of \(\mathbf{y}\) with exponent parameter \(\lambda_{1}\) and shift parameter \(\lambda_{2}\) the time series \(\left(\tilde{y}_{t}\left(\lambda_{1},\lambda_{2}\right)\right)_{t=0}^{T}\equiv \tilde{\mathbf{y}}\left(\lambda_{1},\lambda_{2}\right)\) given by \[\begin{equation} \tilde{y}_{t}\left(\lambda_{1},\lambda_{2}\right)\overset{\text{def}}{=} \left\{ \begin{array} [c]{ll} \frac{\left(y_{t}+\lambda_{2}\right)^{\lambda_{1}}-1}{^{\lambda_{1}}}, & \text{if }\lambda_{1}\neq0,\ \lambda_{2}>\min_{t=0,\dots,T}\left\{y_{t}\right\}, \\ \ln\left(y_{t}+\lambda_{2}\right), & \text{if }\lambda_{1}=0,\ \lambda_{2}>\min_{t=0,\dots,T}\left\{y_{t}\right\}, \end{array} \right. ,\quad\forall t=0,\dots,T. \tag{3.41} \end{equation}\]

Note that the shift parameter \(\lambda_{2}\) plays no other role than allowing the exponential or logarithm transformation, by making the argument positive.

Computationally, the optimal parameter \(\lambda\equiv\lambda_{1}\) for the Box-Cox transformation of the MARWS time series can be determined by the following code chunk, which uses the Guerrero method,

# library(forecast)
# Guerrero method
y <-  MARWS_train_df$RWS
y_BCT_Guerr_lambda <- forecast::BoxCox.lambda(y, method = "guerrero")
tilde_y_BCT_Guerr_lambda <- forecast::BoxCox(y, lambda=y_BCT_Guerr_lambda)
show(y_BCT_Guerr_lambda)
## [1] 0.4771095

or by this other code chunk, which uses the likelihood method.

# library(forecast)
# loglikelihood method
y <-  MARWS_train_df$RWS
y_BCT_loglik_lambda <- forecast::BoxCox.lambda(y, method = "loglik")
tilde_y_BCT_loglik_lambda <- forecast::BoxCox(y, lambda=y_BCT_loglik_lambda)
show(y_BCT_loglik_lambda)
## [1] 0.5

We store in the data frame MARWS_df the results of the Box-Cox transformation and also add the log and square root transformed time series. Note that in the case of the MARWS time series, the Box-Cox transformation with parameter \(\lambda_{1}=1/2\) determined by the loglikelihood method eventually coincides with the square root transformation. However, in general this is not the case but the square root transformation is usually considered, alongside the other transformations, for its simplicity. For illustrative purposes, dealing with the determination of a model for the MARWS time series, we will employ the square-root transformation as if it were distinct from the Box-Cox transformation with parameter \(\lambda_{1}={1/2}\).

MARWS_train_df <- add_column(MARWS_train_df, MARWS_BCT_Guerr=as.vector(tilde_y_BCT_Guerr_lambda), MARWS_BCT_loglik=as.vector(tilde_y_BCT_loglik_lambda), MARWS_log=log(MARWS_train_df$RWS), MARWS_sqrt=sqrt(MARWS_train_df$RWS), .after="RWS")
head(MARWS_train_df)
##   t Year Month  RWS MARWS_BCT_Guerr MARWS_BCT_loglik MARWS_log MARWS_sqrt
## 1 1 1980   Jan  464        37.13267         41.08132  6.139885   21.54066
## 2 2 1980   Feb  675        44.81451         49.96152  6.514713   25.98076
## 3 3 1980   Mar  703        45.73306         51.02829  6.555357   26.51415
## 4 4 1980   Apr  887        51.34379         57.56509  6.787845   29.78255
## 5 5 1980   May 1139        58.11541         65.49815  7.037906   33.74907
## 6 6 1980   Jun 1077        56.52878         63.63536  6.981935   32.81768
tail(MARWS_train_df)
##       t Year Month  RWS MARWS_BCT_Guerr MARWS_BCT_loglik MARWS_log MARWS_sqrt
## 163 163 1993   Jul 3057        94.34235        108.58029  8.025189   55.29014
## 164 164 1993   Aug 3330        98.35952        113.41230  8.110728   57.70615
## 165 165 1993   Sep 1896        74.68788         85.08616  7.547502   43.54308
## 166 166 1993   Oct 2096        78.45104         89.56419  7.647786   45.78209
## 167 167 1993   Nov 2374        83.38235         95.44742  7.772332   48.72371
## 168 168 1993   Dec 2535        86.10072         98.69757  7.837949   50.34878

Thereafter, we plot the transformed time series

Scatter and line plot of the Box-Cox transformation of MARWS time series (\(\lambda\) chosen by the Guerrero method).

# The scatter plot
Data_df <- MARWS_train_df
length <- nrow(Data_df)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Scatter Plot of the Box-Cox transformed Monthly AU Red Wine Monthly Sales In-Sample Set (Guerrero Method) from ", .(First_Date), " to ", .(Last_Date))))
subtitle_content <- bquote(paste("path length ", .(length), " sample points. Data by courtesy of R. Hyndmay et al"))
caption_content <- "Author: Roberto Monte"
x_name <- bquote("")
x_breaks_num <- 30
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((max(x_breaks)-x_breaks_up)>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- paste(Data_df$Month[x_breaks],Data_df$Year[x_breaks])
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth, x_breaks_up+J*x_binwidth)
y_name <- bquote("sales (transf. kliters)")
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$MARWS_BCT_Guerr)-min(Data_df$MARWS_BCT_Guerr))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$MARWS_BCT_Guerr)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$MARWS_BCT_Guerr)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0.5
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("Transf. Red Wine Monthly Sales")
col_2 <- bquote("LOESS curve")
col_3 <- bquote("regression line")
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="blue", "col_2"="red", "col_3"="green")
leg_breaks <- c("col_1", "col_2", "col_3")
MARWS_sp_1 <- ggplot(Data_df) +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=MARWS_BCT_Guerr, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=MARWS_BCT_Guerr, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_point(alpha=1, size=1.0, shape=19, aes(x=t, y=MARWS_BCT_Guerr, color="col_1")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks,
                     guide=guide_legend(override.aes=list(shape=c(19,NA,NA), linetype=c("blank", "dashed", "solid")))) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
       axis.text.x = element_text(angle=-45, vjust=1, hjust=-0.3),
       legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(MARWS_sp_1)

# The line plot
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Line Plot of the Box-Cox transformed Monthly AU Red Wine Monthly Sales In-Sample Set (Guerrero Method) from ", .(First_Date), " to ", .(Last_Date))))
MARWS_lp_1 <- ggplot(Data_df) +
 geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=MARWS_BCT_Guerr, color="col_3"),
            method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
 geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=MARWS_BCT_Guerr, color="col_2"),
             method = "loess", formula = y ~ x, se=FALSE) +
 geom_line(alpha=1, size=0.8, linetype="solid", aes(x=t, y=MARWS_BCT_Guerr, color="col_1", group=1)) +
 scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
 scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                    sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
 ggtitle(title_content) +
 labs(subtitle=subtitle_content, caption=caption_content) +
 scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
 theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
       axis.text.x = element_text(angle=-45, vjust=1, hjust=-0.3),
       legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(MARWS_lp_1)

Scatter and line plot of the Box-Cox transformation of MARWS time series (\(\lambda\) chosen by the loglikelihood method).

# The scatter plot
Data_df <- MARWS_train_df
length <- nrow(Data_df)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Scatter Plot of the Box-Cox transformed Monthly AU Red Wine Monthly Sales In-Sample Set (loglikehood method) from ", .(First_Date), " to ", .(Last_Date))))
subtitle_content <- bquote(paste("path length ", .(length), " sample points. Data by courtesy of R. Hyndmay et al"))
caption_content <- "Author: Roberto Monte"
x_name <- bquote("")
x_breaks_num <- 30
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((max(x_breaks)-x_breaks_up)>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- paste(Data_df$Month[x_breaks],Data_df$Year[x_breaks])
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth, x_breaks_up+J*x_binwidth)
y_name <- bquote("sales (transf. kliters)")
y_breaks_num <- 10
y_binwidth <- round((max(na.omit(Data_df$MARWS_BCT_loglik))-min(na.omit(Data_df$MARWS_BCT_loglik)))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(na.omit(Data_df$MARWS_BCT_loglik))/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(na.omit(Data_df$MARWS_BCT_loglik))/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0.5
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("Transf. Red Wine Monthly Sales")
col_2 <- bquote("LOESS curve")
col_3 <- bquote("regression line")
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="blue", "col_2"="red", "col_3"="green")
leg_breaks <- c("col_1", "col_2", "col_3")
MARWS_sp_2 <- ggplot(Data_df) +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=MARWS_BCT_loglik, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=MARWS_BCT_loglik, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_point(alpha=1, size=1.0, shape=19, aes(x=t, y=MARWS_BCT_loglik, color="col_1")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks,
                      guide=guide_legend(override.aes=list(shape=c(19,NA,NA), linetype=c("blank", "dashed", "solid")))) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=-45, vjust=1, hjust=-0.3),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(MARWS_sp_2)

# The line plot
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Line Plot of the Box-Cox transformed Monthly AU Red Wine Monthly Sales In-Sample Set (loglikehood method) from ", .(First_Date), " to ", .(Last_Date))))
MARWS_lp_2 <- ggplot(Data_df) +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=MARWS_BCT_loglik, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=MARWS_BCT_loglik, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_line(alpha=1, size=0.8, linetype="solid", aes(x=t, y=MARWS_BCT_loglik, color="col_1", group=1)) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=-45, vjust=1, hjust=-0.3),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(MARWS_lp_2)

Scatter and line plot of the (Box-Cox) log transformation of MARWS time series.

# The scatter plot
Data_df <- MARWS_train_df
length <- nrow(Data_df)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Scatter Plot of the Box-Cox transformed Monthly AU Red Wine Monthly Sales In-Sample Set (Log Transformation) from ", .(First_Date), " to ", .(Last_Date))))
subtitle_content <- bquote(paste("path length ", .(length), " sample points"))
caption_content <- "Author: Roberto Monte"
x_name <- bquote("")
y_name <- bquote("sales (log. kliters)")
x_breaks_num <- 30
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((max(x_breaks)-x_breaks_up)>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- paste(Data_df$Month[x_breaks],Data_df$Year[x_breaks])
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth, x_breaks_up+J*x_binwidth)
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$MARWS_log)-min(Data_df$MARWS_log))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$MARWS_log)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$MARWS_log)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0.5
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("Log. Red Wine Monthly Sales")
col_2 <- bquote("LOESS curve")
col_3 <- bquote("regression line")
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="blue", "col_2"="red", "col_3"="green")
leg_breaks <- c("col_1", "col_2", "col_3")
MARWS_sp_3<- ggplot(Data_df) +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=MARWS_log, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=MARWS_log, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_point(alpha=1, size=1.0, shape=19, aes(x=t, y=MARWS_log, color="col_1")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks,
                      guide=guide_legend(override.aes=list(shape=c(19,NA,NA), linetype=c("blank", "dashed", "solid")))) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=-45, vjust=1, hjust=-0.3),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(MARWS_sp_3)

# The line plot
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Line Plot of the Box-Cox transformed Monthly AU Red Wine Monthly Sales In-Sample Set (Log Transformation) from ", .(First_Date), " to ", .(Last_Date))))
MARWS_lp_3 <- ggplot(Data_df) +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=MARWS_log, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=MARWS_log, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_line(alpha=1, size=0.8, linetype="solid", aes(x=t, y=MARWS_log, color="col_1", group=1)) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=-45, vjust=1, hjust=-0.3),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(MARWS_lp_3)

Scatter and line plot of the (Box-Cox) square root transformation of MARWS time series.

# The scatter plot
Data_df <- MARWS_train_df
length <- nrow(Data_df)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Scatter Plot of the Box-Cox transformed Monthly AU Red Wine Monthly Sales In-Sample Set (Square Root Transformation) from ", .(First_Date), " to ", .(Last_Date))))
subtitle_content <- bquote(paste("path length ", .(length), " sample points. Data by courtesy of R. Hyndmay et al"))
caption_content <- "Author: Roberto Monte"
x_name <- bquote("")
y_name <- bquote("sales (sqrt kliters)")
x_breaks_num <- 30
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((max(x_breaks)-x_breaks_up)>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- paste(Data_df$Month[x_breaks],Data_df$Year[x_breaks])
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth, x_breaks_up+J*x_binwidth)
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$MARWS_sqrt)-min(Data_df$MARWS_sqrt))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$MARWS_sqrt)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$MARWS_sqrt)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0.5
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("Sqrt Red Wine Monthly Sales")
col_2 <- bquote("LOESS curve")
col_3 <- bquote("regression line")
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="blue", "col_2"="red", "col_3"="green")
leg_breaks <- c("col_1", "col_2", "col_3")
MARWS_sp_4 <- ggplot(Data_df) +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=MARWS_sqrt, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=MARWS_sqrt, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_point(alpha=1, size=1.0, shape=19, aes(x=t, y=MARWS_sqrt, color="col_1")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks,
                      guide=guide_legend(override.aes=list(shape=c(19,NA,NA), linetype=c("blank", "dashed", "solid")))) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=-45, vjust=1, hjust=-0.3),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(MARWS_sp_4)

# The Line plot
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Line Plot of the Box-Cox transformed Monthly AU Red Wine Monthly Sales In-Sample Set (Square Root Transformation) from ", .(First_Date), " to ", .(Last_Date))))
subtitle_content <- bquote(paste("path length ", .(length), " sample points. Data by courtesy of R. Hyndmay et al"))
caption_content <- "Author: Roberto Monte"
x_name <- bquote("")
y_name <- bquote("sales (sqrt kliters)")
x_breaks_num <- 30
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((max(x_breaks)-x_breaks_up)>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- paste(Data_df$Month[x_breaks],Data_df$Year[x_breaks])
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth, x_breaks_up+J*x_binwidth)
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$MARWS_log)-min(Data_df$MARWS_log))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$MARWS_log)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$MARWS_log)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0.5
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("Sqrt Red Wine Monthly Sales")
col_2 <- bquote("LOESS curve")
col_3 <- bquote("regression line")
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="blue", "col_2"="red", "col_3"="green")
leg_breaks <- c("col_1", "col_2", "col_3")
MARWS_lp_4 <- ggplot(Data_df) +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=MARWS_log, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=MARWS_log, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_line(alpha=1, size=0.8, linetype="solid", aes(x=t, y=MARWS_log, color="col_1", group=1)) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=-45, vjust=1, hjust=-0.3),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(MARWS_lp_4)

Summary of the line plots.

par(mfrow=c(2,2))
plot(MARWS_lp_1)
plot(MARWS_lp_2)
plot(MARWS_lp_3)
plot(MARWS_lp_4)

After checking the kurtosis we apply the Breusch-Pagan and White test to the transformations of the MARWS time series.

Data_BP_df  <- data.frame(x=MARWS_train_df$t, y_Guerr=MARWS_train_df$MARWS_BCT_Guerr, y_Loglik=MARWS_train_df$MARWS_BCT_loglik, y_log=MARWS_train_df$MARWS_log,  y_sqrt=MARWS_train_df$MARWS_sqrt)
head(Data_BP_df)
##   x  y_Guerr y_Loglik    y_log   y_sqrt
## 1 1 37.13267 41.08132 6.139885 21.54066
## 2 2 44.81451 49.96152 6.514713 25.98076
## 3 3 45.73306 51.02829 6.555357 26.51415
## 4 4 51.34379 57.56509 6.787845 29.78255
## 5 5 58.11541 65.49815 7.037906 33.74907
## 6 6 56.52878 63.63536 6.981935 32.81768
# Data_BP_Guerr_lm <- lm(y_Guerr~x, data=Data_BP_df)
# Data_BP_Loglik_lm <- lm(y_Loglik~x, data=Data_BP_df)
# Data_BP_log_lm <- lm(y_log~x, data=Data_BP_df)
# Data_BP_sqrt_lm <- lm(y_sqrt~x, data=Data_BP_df)

We check the kurtosis of the Box-Cox transformation (Guerrero method) of the MARWS time series. We consider bootstrap confidence interval, since we don’t assume the transformed time series is Gaussian distributed.

set.seed(12345)
y_Guerr_kurt <- DescTools::Kurt(Data_BP_df$y_Guerr, weights = NULL, na.rm = TRUE, method = 2, conf.level = 0.95, ci.type = "bca", R=1000) 
show(y_Guerr_kurt)
##       kurt     lwr.ci     upr.ci 
## -0.6772642 -0.9823622 -0.2934626

As we don’t estimate excess of kurtosis (actually, the transformed time series seems to be platykurtic, since the point \(0\) in not in the confidence interval), we apply the unstudentized versions of the Breusch-Pagan test.

y_Guerr_unstud_BP <- lmtest::bptest(formula = y_Guerr~x, varformula = NULL, studentize = FALSE, data=Data_BP_df)
show(y_Guerr_unstud_BP)
## 
##  Breusch-Pagan test
## 
## data:  y_Guerr ~ x
## BP = 2.9134, df = 1, p-value = 0.08784

The same for the White test.

y_Guerr_unstud_W <- lmtest::bptest(formula = y_Guerr~x, varformula = y_Guerr~x+I(x^2), studentize = FALSE, data=Data_BP_df)
show(y_Guerr_unstud_W)
## 
##  Breusch-Pagan test
## 
## data:  y_Guerr ~ x
## BP = 2.9884, df = 2, p-value = 0.2244

Second, we consider the Box-Cox transformation (loglike method) of the MARWS time series.

set.seed(12345)
y_Loglik_kurt <- DescTools::Kurt(Data_BP_df$y_Loglik, weights = NULL, na.rm = TRUE, method = 2, conf.level = 0.95, ci.type = "bca", R=1000)
show(y_Loglik_kurt)
##       kurt     lwr.ci     upr.ci 
## -0.6795489 -0.9856308 -0.2971035

Also in this case we don’t estimate excess of kurtosis and we apply the unstudentized versions of the tests.

y_unstud_Loglik_BP <- lmtest::bptest(formula = y_Loglik~x, varformula = NULL, studentize = FALSE, data=Data_BP_df)
show(y_unstud_Loglik_BP)
## 
##  Breusch-Pagan test
## 
## data:  y_Loglik ~ x
## BP = 3.247, df = 1, p-value = 0.07155
y_unstud_Loglik_W <- lmtest::bptest(formula = y_Loglik~x, varformula = y_Loglik~x+I(x^2), studentize = FALSE, data=Data_BP_df)
show(y_unstud_Loglik_W)
## 
##  Breusch-Pagan test
## 
## data:  y_Loglik ~ x
## BP = 3.3191, df = 2, p-value = 0.1902

Third, we consider the log transformation of the MARWS time series.

set.seed(12345)
y_log_kurt <- DescTools::Kurt(Data_BP_df$y_log, weights = NULL, na.rm = TRUE, method = 2, conf.level = 0.95, ci.type = "bca", R=1000)
show(y_log_kurt)
##       kurt     lwr.ci     upr.ci 
## -0.3470708 -0.7683198  0.3772547

Also in this case we don’t estimate excess of kurtosis (the transformed time series seems to be mesokurtic, since the point \(0\) in in the confidence interval) and we apply the unstudentized versions of the tests.

y_unstud_log_BP <- lmtest::bptest(formula = y_log~x, varformula = NULL, studentize = FALSE, data=Data_BP_df)
show(y_unstud_log_BP)
## 
##  Breusch-Pagan test
## 
## data:  y_log ~ x
## BP = 0.15823, df = 1, p-value = 0.6908
y_unstud_log_W <- lmtest::bptest(formula = y_log~x, varformula = y_log~x+I(x^2), studentize = FALSE, data=Data_BP_df)
show(y_unstud_log_W)
## 
##  Breusch-Pagan test
## 
## data:  y_log ~ x
## BP = 0.65873, df = 2, p-value = 0.7194

In the end, we consider the square root transformation of the MARWS time series.

set.seed(12345)
y_sqrt_kurt <- DescTools::Kurt(Data_BP_df$y_sqrt, weights = NULL, na.rm = TRUE, method = 2, conf.level = 0.95, ci.type = "bca", R=1000)
show(y_sqrt_kurt)
##       kurt     lwr.ci     upr.ci 
## -0.6795489 -0.9856308 -0.2971035

and we apply the unstudentized versions of the tests.

y_unstud_sqrt_BP <- lmtest::bptest(formula = y_sqrt~x, varformula = NULL, studentize = FALSE, data=Data_BP_df)
show(y_unstud_sqrt_BP)
## 
##  Breusch-Pagan test
## 
## data:  y_sqrt ~ x
## BP = 3.247, df = 1, p-value = 0.07155
y_unstud_sqrt_W <- lmtest::bptest(formula = y_sqrt~x, varformula = y_sqrt~x+I(x^2), studentize = FALSE, data=Data_BP_df)
show(y_unstud_sqrt_W)
## 
##  Breusch-Pagan test
## 
## data:  y_sqrt ~ x
## BP = 3.3191, df = 2, p-value = 0.1902

Note that the kurtosis, the Breusch-Pagan, and White tests return the same result for the Box-Cox transformation of the MARWS time series with parameter \(\lambda_{1}=1/2\).

From a compared visual inspection of the scatter plots and from the results of Breusch-Pagan and White tests the logarithm transformation of the MARWS time series appears to be the most promising in terms of regularization power. Therefore, we refer to the latter and proceed with our analysis.

We then consider the linear model MARWS_log_lm for the logarithm transform and its residuals.

MARWS_log_lm <- lm(MARWS_log~t, data=MARWS_train_df)
MARWS_log_lm_summ <- summary(MARWS_log_lm)
show(MARWS_log_lm_summ)
## 
## Call:
## lm(formula = MARWS_log ~ t, data = MARWS_train_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.92154 -0.10220  0.02203  0.17208  0.59948 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 6.7949963  0.0422452   160.8   <2e-16 ***
## t           0.0058528  0.0004336    13.5   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2726 on 166 degrees of freedom
## Multiple R-squared:  0.5233, Adjusted R-squared:  0.5204 
## F-statistic: 182.2 on 1 and 166 DF,  p-value: < 2.2e-16
MARWS_log_lm_res <- residuals(MARWS_log_lm)
show(MARWS_log_lm_res[1:10])
##           1           2           3           4           5           6 
## -0.66096456 -0.29198920 -0.25719778 -0.03056247  0.21364573  0.15182167 
##           7           8           9          10 
##  0.34790492  0.29704843  0.17341261  0.01652928

Now, the five summary points of Residuals section of the MARWS_log_lm model appears to be more symmetric and fit better the corresponding summary points of the zero centered Gaussian distribution with standard deviation equal to the RSE. In fact, considering that

RSE <- MARWS_log_lm_summ$sigma
show(c(round(qnorm(0.25, mean = 0, sd = RSE, lower.tail = TRUE),5), round(qnorm(0.75, mean = 0, sd = RSE, lower.tail = TRUE),5)))
## [1] -0.18384  0.18384
show(c(round(-3*RSE,5),round(3*RSE,5)))
## [1] -0.81768  0.81768

for a such a Gaussian distribution we have the following theoretical values (to be compared to the computed values in the last row): \[\begin{equation} \begin{array}{ccccc} Min\,(99.73\%) & 1Q & Median & 3Q & Max\,(99.73\%)\\ -0.81768 & -0.18384 & 0.00 & 0.18384 & 0.81768\\ -0.92154 & -0.102201 & 0.02203 & 0.17208 & 0.59948 \end{array} \end{equation}\] where \[\begin{equation} \begin{array}{c} Min\,(99.73\%)= Mean-3*RSE, \qquad Max\,(99.73\%)= Mean+3*RSE,\\ 1Q = qnorm(0.25, mean = 0, sd = RSE, lower.tail = TRUE),\\ 3Q = qnorm(0.75, mean = 0, sd = RSE, lower.tail = TRUE),\\ \end{array} \end{equation}\]

Recall that, under the assumption that the residuals of the linear model MARWS_log_lm are independently sampled from a Guassian distribution with mean \(\mu\) and standard deviation \(\sigma\), retrieving the estimates for 1Q and 3Q from the summary of MARWS_lm, by means of

MARWS_log_lm_res_summ <- summary(MARWS_log_lm$residuals)
show(MARWS_log_lm_res_summ)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.92154 -0.10220  0.02203  0.00000  0.17208  0.59948

We obtain the quartile-estimates for \(\mu\) and \(\sigma\), respectively, given by

quart_mean <- as.vector(MARWS_lm_res_summ[2]+MARWS_lm_res_summ[5])/2
show(quart_mean)
## [1] 21.71993

and

quart_sd <- as.vector(MARWS_lm_res_summ[5]-MARWS_lm_res_summ[2])/(2*0.67448)
show(quart_sd)
## [1] 318.4769

As a consequence, if the residuals of the linear model MARWS_log_lm were generated by independent sampling from a Gaussian distribution, then the mean and the standard deviation of such a Gaussian distribution would take values rather close to zero and RSE, respectively. Again, to answer the question “how close?”, we consider the confidence intervals of the mean and the standard deviation of the residuals under the assumption that they are generated by independent sampling from a Gaussian distribution.

MARWS_log_lm_res_t_test <- t.test(x=MARWS_log_lm_res, alternative = "two.sided", mu=0, conf.level=0.95)
show(c(round(MARWS_log_lm_res_t_test$estimate, digits=6),round(MARWS_log_lm_res_t_test$conf.int, digits=6),round(MARWS_log_lm_res_t_test$p.value, digits=6)))
## mean of x                               
##  0.000000 -0.041391  0.041391  1.000000
MARWS_log_lm_res_chisq_test <- EnvStats::varTest(x=MARWS_log_lm_res, alternative="two.sided", sigma.squared=MARWS_log_lm_summ$sigma^2, conf.level=0.95)
show(c(round(MARWS_log_lm_res_chisq_test$estimate, digits=6),round(MARWS_log_lm_res_chisq_test$conf.int, digits=6),round(MARWS_log_lm_res_chisq_test$p.value, digits=6)))
## variance      LCL      UCL          
## 0.073844 0.060251 0.092644 0.985383

The mean value \(quant\_mean = 0.03777141\) estimated via the empirical quartiles under the assumption of Gaussian distributed residuals of the linear model is in the \(95\%\) confidence interval \([-0.039958, 0.039958]\) of the zero residual mean, but the corresponding variance \(quant\_sd^{2}=0.0427055\) is not in the \(95\%\) confidence interval \([0.063232, 0.095047]\) of the \(RSE^2=0.07712951\) residual variance. This means that the assumption of Gaussianity for the distribution which generates the residuals of the linear model is unlikely.

For completeness, we also compute the skewness, and the kurtosis jointly with the \(95\%\) confidence intervals derived under the assumption of Gaussian distributed data set.

set.seed(12345)
Skew_Gauss <- DescTools::Skew(x=MARWS_log_lm_res, weights = NULL, na.rm = TRUE, method = 2, conf.level = 0.95, ci.type = "classic")
show(Skew_Gauss)
##   skewness     lwr.ci     upr.ci 
## -0.7304363 -0.3671480  0.3671480
set.seed(12345)
Kurt_Gauss <- DescTools::Kurt(x=MARWS_log_lm_res, weights = NULL, na.rm = TRUE, method = 2, conf.level = 0.95, ci.type = "classic")
show(Kurt_Gauss)
##   kurtosis     lwr.ci     upr.ci 
##  0.6481521 -0.7301426  0.7301426

Thus, the residuals of the linear model MARWS_log_lm appear to be left-skewed, altough but mesotokurtic, at the \(95\%\) confidence level.

Hence, the basic analysis of the residuals of the linear model MARWS_log_lm does provide some evidences to reject the assumption that the residuals are generated by independent sampling from the zero centered Gaussian distribution with standard deviation \(RSE=0.2777\). This should be not unexpected, because a Box-Cox transformation can remove heteroskedasticity, but not trend or seasonality.

A visual inspection of the residuals is obtained by the Residuals vs Fitted Values draft scatter plot.

plot(MARWS_log_lm,1)

It may be also interesting a visual inspection of the Residual Scale-Location plot.

plot(MARWS_log_lm,3)

Alternatively, We build an MARWS_train_log_df data frame.

head(MARWS_train_df)
##   t Year Month  RWS MARWS_BCT_Guerr MARWS_BCT_loglik MARWS_log MARWS_sqrt
## 1 1 1980   Jan  464        37.13267         41.08132  6.139885   21.54066
## 2 2 1980   Feb  675        44.81451         49.96152  6.514713   25.98076
## 3 3 1980   Mar  703        45.73306         51.02829  6.555357   26.51415
## 4 4 1980   Apr  887        51.34379         57.56509  6.787845   29.78255
## 5 5 1980   May 1139        58.11541         65.49815  7.037906   33.74907
## 6 6 1980   Jun 1077        56.52878         63.63536  6.981935   32.81768
MARWS_train_log_df <-  subset(MARWS_train_df, select=-c(MARWS_BCT_Guerr, MARWS_BCT_loglik, MARWS_sqrt))
head(MARWS_train_log_df)
##   t Year Month  RWS MARWS_log
## 1 1 1980   Jan  464  6.139885
## 2 2 1980   Feb  675  6.514713
## 3 3 1980   Mar  703  6.555357
## 4 4 1980   Apr  887  6.787845
## 5 5 1980   May 1139  7.037906
## 6 6 1980   Jun 1077  6.981935
MARWS_train_log_df <- add_column(MARWS_train_log_df, MARWS_log_Fit=MARWS_log_lm$fitted.values,
                           MARWS_log_Res=MARWS_log_lm$residuals, MARWS_log_Sqrt_Abs_Res=sqrt(abs(MARWS_log_lm$residuals)),
                           .after="MARWS_log")
head(MARWS_train_log_df)
##   t Year Month  RWS MARWS_log MARWS_log_Fit MARWS_log_Res
## 1 1 1980   Jan  464  6.139885      6.800849   -0.66096456
## 2 2 1980   Feb  675  6.514713      6.806702   -0.29198920
## 3 3 1980   Mar  703  6.555357      6.812555   -0.25719778
## 4 4 1980   Apr  887  6.787845      6.818407   -0.03056247
## 5 5 1980   May 1139  7.037906      6.824260    0.21364573
## 6 6 1980   Jun 1077  6.981935      6.830113    0.15182167
##   MARWS_log_Sqrt_Abs_Res
## 1              0.8129973
## 2              0.5403603
## 3              0.5071467
## 4              0.1748212
## 5              0.4622183
## 6              0.3896430

We draw the scatter plot of Residuals vs Fitted in the MARWS_log_lm model.

# The scatter plot
Data_df  <- MARWS_train_log_df
length <- nrow(Data_df)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Residuals vs Fitted in the Linear Model for AU Red Wine Monthly Logarithm Sales from ", .(First_Date), " to ", .(Last_Date))))
subtitle_content <- bquote(paste("path length ", .(length), " sample points. Data by courtesy of R. Hyndmay et al"))
caption_content <- "Author: Roberto Monte"
x_name <- bquote("fitted values")
x_breaks_num <- 10
x_binwidth <- round((max(Data_df$MARWS_log_Fit)-min(Data_df$MARWS_log_Fit))/x_breaks_num, digits=1)
x_breaks_low <- floor((min(Data_df$MARWS_log_Fit)/x_binwidth))*x_binwidth
x_breaks_up <- ceiling((max(Data_df$MARWS_log_Fit)/x_binwidth))*x_binwidth
x_breaks <- c(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth))
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth, x_breaks_up+J*x_binwidth)
y_name <- bquote("square roots of absolute residuals")
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$MARWS_log_Res)-min(Data_df$MARWS_log_Res))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$MARWS_log_Res)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$MARWS_log_Res)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0.5
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("Path of square roots of absolute residuals vs fitted")
col_2 <- bquote("LOESS curve")
col_3 <- bquote("regression line")
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="blue", "col_2"="red", "col_3"="green")
leg_breaks <- c("col_1", "col_2", "col_3")
MARWS_log_Res_sp <- ggplot(Data_df) +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=MARWS_log_Fit, y=MARWS_log_Res, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=MARWS_log_Fit, y=MARWS_log_Res, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_point(alpha=1, size=1.0, shape=19, aes(x=MARWS_log_Fit, y=MARWS_log_Res, color="col_1")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks,
                      guide=guide_legend(override.aes=list(shape=c(19,NA,NA), 
                                                           linetype=c("blank", "dashed", "solid")))) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=0, vjust=1),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(MARWS_log_Res_sp)

We draw the scatter plot of the Absolute Residual Square Root vs Fitted in the MARWS_log_lm model.

# The scatter plot
Data_df  <- MARWS_train_log_df
length <- nrow(Data_df)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Scale Location Plot of Residuals in the Linear Model for AU Red Wine Monthly Logarithm Sales from ", .(First_Date), " to ", .(Last_Date))))
subtitle_content <- bquote(paste("path length ", .(length), " sample points. Data by courtesy of R. Hyndmay et al"))
caption_content <- "Author: Roberto Monte"
x_name <- bquote("fitted values")
x_breaks_num <- 10
x_binwidth <- round((max(Data_df$MARWS_log_Fit)-min(Data_df$MARWS_log_Fit))/x_breaks_num, digits=1)
x_breaks_low <- floor((min(Data_df$MARWS_log_Fit)/x_binwidth))*x_binwidth
x_breaks_up <- ceiling((max(Data_df$MARWS_log_Fit)/x_binwidth))*x_binwidth
x_breaks <- c(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth))
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth, x_breaks_up+J*x_binwidth)
y_name <- bquote("square roots of absolute residuals")
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$MARWS_log_Sqrt_Abs_Res)-min(Data_df$MARWS_log_Sqrt_Abs_Res))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$MARWS_log_Sqrt_Abs_Res)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$MARWS_log_Sqrt_Abs_Res)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0.5
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("Path of square roots of absolute residuals vs fitted")
col_2 <- bquote("LOESS curve")
col_3 <- bquote("regression line")
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="blue", "col_2"="red", "col_3"="green")
leg_breaks <- c("col_1", "col_2", "col_3")
MARWS_log_Sqrt_Abs_Res_sp <- ggplot(Data_df) +
  geom_hline(yintercept = mean(Data_df$MARWS_log_Sqrt_Abs_Res), size=0.3, colour="black") +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=MARWS_log_Fit, y=MARWS_log_Sqrt_Abs_Res, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=MARWS_log_Fit, y=MARWS_log_Sqrt_Abs_Res, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_point(alpha=1, size=1.0, shape=19, aes(x=MARWS_log_Fit, y=MARWS_log_Sqrt_Abs_Res, color="col_1")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks,
                      guide=guide_legend(override.aes=list(shape=c(19,NA,NA), 
                                                           linetype=c("blank", "dashed", "solid")))) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=0, vjust=1),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(MARWS_log_Sqrt_Abs_Res_sp)

We have a visual evidence for stationarity from the Residuals vs Fitted scatter plot. Moreover, we have no clear evidence for heteroskedasticity in the Scale Location plot.

Eventually, applying the DF test to the residuals of the MARWS_log_lm model.

# library(urca)                     # The library for this veRsion of the test.
y <- MARWS_train_log_df$MARWS_log_Res       # Choosing the data set to be tested.
no_lags <- 0                        # Setting the lag parameter for the test.

Res_DF_none <- ur.df(y, type="none", lags=no_lags, selectlags="Fixed")    
# Applying the form of the DF test which considers the null hypothesis that the data set is generated by 
# a process with a random walk component, while the alternative hypothesis is that the data set is generated 
# by an autoregressive process with no drift and trend.
summary(Res_DF_none) # Showing the result of the test
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.01240 -0.06123  0.03968  0.14041  0.45316 
## 
## Coefficients:
##         Estimate Std. Error t value Pr(>|t|)    
## z.lag.1 -0.54980    0.06776  -8.114 1.04e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2379 on 166 degrees of freedom
## Multiple R-squared:  0.284,  Adjusted R-squared:  0.2797 
## F-statistic: 65.84 on 1 and 166 DF,  p-value: 1.041e-13
## 
## 
## Value of test-statistic is: -8.1139 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

The test statistics of the DF test takes value inside the rejection region at the significance level \(\alpha=0.01\) or \(\alpha=1\%\). Therefore we can reject the unit root null hypothesis in favor of the mean stationary alternative.

We apply the KPSS test.

# library(urca)                  # The library for this vesion of the test
y <- MARWS_train_log_df$MARWS_log_Res    # Choosing the data set to be tested

Res_KPSS_mu <- ur.kpss(y, type="mu", lags="nil", use.lag=NULL)    
# Applying the simplest form of the KPSS test which considers the hull hypothesis that
# the data set is generated by an autoregressive process with constant mean,
# while the alternative hypothesis is that the data set is generated a process with a random walk component.

summary(Res_KPSS_mu)    # Showing the result of the test
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 0 lags. 
## 
## Value of test-statistic is: 0.1635 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Thereafter, we apply The BP and W homoskedasticity tests.

First, we build a convenient data frame and linear model.

Data_df  <- data.frame(x=MARWS_train_log_df$t, y=MARWS_train_log_df$MARWS_log, y_fit=MARWS_train_log_df$MARWS_log_Fit, 
                       y_res=MARWS_train_log_df$MARWS_log_Res)
Data_lm <- lm(y~x, data=Data_df)                              # The original linear model

Second, we apply the Studentized BP test

# Studentized Breusch-Pagan test
lmtest::bptest(Data_lm, studentize = TRUE)
## 
##  studentized Breusch-Pagan test
## 
## data:  Data_lm
## BP = 0.12202, df = 1, p-value = 0.7269

Another Version of the BP test

# (olsrr - F Test)
olsrr::ols_test_f(Data_lm, fitted_values = TRUE, rhs = FALSE)
## 
##  F Test for Heteroskedasticity
##  -----------------------------
##  Ho: Variance is homogenous
##  Ha: Variance is not homogenous
## 
##  Variables: fitted values of y 
## 
##       Test Summary        
##  -------------------------
##  Num DF     =    1 
##  Den DF     =    166 
##  F          =    0.1206561 
##  Prob > F   =    0.728764

In light of the results of the two variants of the BP test, \(\chi^{2}\)-variant and \(F\)-variant, we cannot reject the homoskedasticity null hypothesis at the \(10\%\) significance level.

However, we also consider the W Test.

# library(lmtest)
var.formula <- ~ x+I(x^2)
lmtest::bptest(formula = y ~ x, varformula = var.formula, studentize = TRUE, data=Data_df)
## 
##  studentized Breusch-Pagan test
## 
## data:  y ~ x
## BP = 0.50798, df = 2, p-value = 0.7757

The White test cannot reject the null hypothesis of homoskedasticity at the \(10\%\) significance level.

In light of the visual inspections, and the results of the BP and W tests, we cannot reject the null hypothesis that the residuals of the MARWS_log_lm model are generated by homoskedastic noise.

As mentioned above, a Box-Cox transformation cannot remove seasonality from a time series. Actually, if we consider the autocorrelogram of the residuals of the MARWS_log_lm model, we can still see the presence of seasonality.

Data_df <- MARWS_train_log_df
y <- Data_df$MARWS_log_Res
length <- length(y)
T <- length
# maxlag <- ceiling(10*log10(T))      # Default
# maxlag <- ceiling(sqrt(n)+45)       # Box-Jenkins
# maxlag <- ceiling(min(10, T/4))     # Hyndman (for data without seasonality)
maxlag <- ceiling(min(2*12, T/5))     # Hyndman https://robjhyndman.com/hyndsight/ljung-box-test/
Aut_Fun_y <- acf(y, lag.max = maxlag, type="correlation", plot=FALSE)
ci_90 <- qnorm((1+0.90)/2)/sqrt(length)
ci_95 <- qnorm((1+0.95)/2)/sqrt(length)
ci_99 <- qnorm((1+0.99)/2)/sqrt(length)
Plot_Aut_Fun_y <- data.frame(lag=Aut_Fun_y$lag, acf=Aut_Fun_y$acf)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Plot of the Autocorrelogram of the Residuals in the Linear Model for AU Red Wine Monthly Logarithm Sales from ", .(First_Date), " to ", .(Last_Date))))
subtitle_content <- bquote(paste("path length ", .(length), " sample points,   ", "lags ", .(maxlag)))
caption_content <- "Author: Roberto Monte"
ggplot(Plot_Aut_Fun_y, aes(x=lag, y=acf)) + 
  geom_segment(aes(x=lag, y=rep(0,length(lag)), xend=lag, yend=acf), size = 1, col="black") +
  # geom_col(mapping=NULL, data=NULL, position="dodge", width = 0.1, col="black", inherit.aes = TRUE)+
  geom_hline(aes(yintercept=-ci_90, color="CI_90"), show.legend = TRUE, lty=3) +
  geom_hline(aes(yintercept=ci_90, color="CI_90"), lty=3) +
  geom_hline(aes(yintercept=ci_95, color="CI_95"), show.legend = TRUE, lty=4) + 
  geom_hline(aes(yintercept=-ci_95, color="CI_95"), lty=4) +
  geom_hline(aes(yintercept=-ci_99, color="CI_99"), show.legend = TRUE, lty=4) +
  geom_hline(aes(yintercept=ci_99, color="CI_99"), lty=4) +
  scale_x_continuous(name="lag", breaks=waiver(), label=waiver()) +
  scale_y_continuous(name="acf value", breaks=waiver(), labels=NULL,
                     sec.axis = sec_axis(~., breaks=waiver(), labels=waiver())) +
  scale_color_manual(name="Conf. Inter.", labels=c("90%","95%","99%"),
                     values=c(CI_90="red", CI_95="blue", CI_99="green")) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")

We still need to demean and deseasonalize the transformed homoskedastic time series. We stress that the decomposition presented below is just one of the many that could be performed. It is a rather naive decomposition but achieves its goal (see Hyndman R.J., Athanasopoulos G., Forecasting: Principles and Practice (2nd ed), Chapter 6 - Time series decomposition https://otexts.com/fpp2/decomposition.html for a survey on other possible decompositions).

We start with building a time series object ts from logarithm transform of the MARWS time series.

y <- MARWS_train_df$MARWS_log
MARWS_log_ts = ts(y, frequency=12, start=c(1980,1))
class(MARWS_log_ts)
## [1] "ts"
str(MARWS_log_ts)
##  Time-Series [1:168] from 1980 to 1994: 6.14 6.51 6.56 6.79 7.04 ...
window(MARWS_log_ts, end=c(1982,12))
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1980 6.139885 6.514713 6.555357 6.787845 7.037906 6.981935 7.183871 7.138867
## 1981 6.272877 6.783325 6.795706 6.951772 7.089243 7.160069 7.355641 7.363280
## 1982 6.298949 6.453625 6.689599 6.887553 6.925595 6.969791 7.247081 7.159292
##           Sep      Oct      Nov      Dec
## 1980 7.021084 6.870053 6.903747 6.866933
## 1981 6.981006 6.822197 6.915723 6.968850
## 1982 7.006695 6.906755 6.903747 6.922644
window(MARWS_log_ts, start=c(1993,01))
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1993 6.792344 7.128496 7.609367 7.721792 7.720905 7.720905 8.025189 8.110728
##           Sep      Oct      Nov      Dec
## 1993 7.547502 7.647786 7.772332 7.837949

From the inspection of the time series MARWS_log_ts we note that monthly data of the last year are complete. This is a convenient feature when dealing with an i-sample subset. Hence, as a first step to demean the MARWS_log_ts time series, we rewrite it in matrix form.

# Transforming *MARWS_log_ts* in a matrix.
MARWS_log_mat <- t(matrix(as.vector(MARWS_log_ts), nrow=12))                            
show(MARWS_log_mat)
##           [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]
##  [1,] 6.139885 6.514713 6.555357 6.787845 7.037906 6.981935 7.183871 7.138867
##  [2,] 6.272877 6.783325 6.795706 6.951772 7.089243 7.160069 7.355641 7.363280
##  [3,] 6.298949 6.453625 6.689599 6.887553 6.925595 6.969791 7.247081 7.159292
##  [4,] 6.421622 6.582025 6.723832 6.884487 7.146772 7.270313 7.326466 7.443078
##  [5,] 6.549651 6.721426 6.903747 7.024649 7.284821 7.146772 7.469084 7.722235
##  [6,] 6.695799 6.904751 7.059618 7.094235 7.338238 7.321850 7.228388 7.641564
##  [7,] 6.658011 6.912743 7.084226 7.327781 7.338888 7.343426 7.657283 7.751905
##  [8,] 6.701960 7.047517 7.110696 7.433075 7.472501 7.469654 7.649693 7.631432
##  [9,] 6.873164 7.345365 7.338238 7.385231 7.639161 7.667158 7.974877 7.718241
## [10,] 7.037028 7.265430 7.500529 7.474772 7.696213 7.633854 7.825245 7.669028
## [11,] 6.877296 7.089243 7.448916 7.428333 7.613325 7.626083 7.799343 7.763446
## [12,] 6.914731 7.417580 7.403670 7.325149 7.512618 7.699389 7.945201 7.780303
## [13,] 7.100027 7.196687 7.606387 7.528332 7.577634 7.674153 7.949797 7.707063
## [14,] 6.792344 7.128496 7.609367 7.721792 7.720905 7.720905 8.025189 8.110728
##           [,9]    [,10]    [,11]    [,12]
##  [1,] 7.021084 6.870053 6.903747 6.866933
##  [2,] 6.981006 6.822197 6.915723 6.968850
##  [3,] 7.006695 6.906755 6.903747 6.922644
##  [4,] 7.048386 6.839476 7.055313 7.097549
##  [5,] 7.096721 7.123673 7.142827 7.510978
##  [6,] 7.213032 7.336937 7.330405 7.226936
##  [7,] 7.375256 7.212294 7.347944 7.385851
##  [8,] 7.606885 7.548029 7.582738 7.689829
##  [9,] 7.540622 7.461066 7.510978 7.532624
## [10,] 7.651120 7.586804 7.687539 7.759614
## [11,] 7.709757 7.524021 7.671827 7.734559
## [12,] 7.743270 7.487174 7.624131 7.682943
## [13,] 7.687997 7.596894 7.778630 7.909857
## [14,] 7.547502 7.647786 7.772332 7.837949
# The way MARWS_log_mat is built makes it a matrix with the same structure of time series MARWS_log_ts,
# that is the matrix MARWS_log_mat has each year [resp. month] along a row [resp. column]. 

# Annual averaging by row averages.
MARWS_log_ann_av_vec <- rowMeans(MARWS_log_mat, na.rm=TRUE)                  
show(MARWS_log_ann_av_vec)
##  [1] 6.833516 6.954974 6.864277 6.986610 7.141382 7.199313 7.282967 7.412001
##  [9] 7.498894 7.565598 7.523846 7.544680 7.609455 7.636275
# Transforming the annual average vector in a trend vector, in which each each annual average is repeated for twelve entries.
MARWS_log_trend_vec <- rep(MARWS_log_ann_av_vec, each=12)           
show(MARWS_log_trend_vec[1:36]) # Showing the first 36 entries.
##  [1] 6.833516 6.833516 6.833516 6.833516 6.833516 6.833516 6.833516 6.833516
##  [9] 6.833516 6.833516 6.833516 6.833516 6.954974 6.954974 6.954974 6.954974
## [17] 6.954974 6.954974 6.954974 6.954974 6.954974 6.954974 6.954974 6.954974
## [25] 6.864277 6.864277 6.864277 6.864277 6.864277 6.864277 6.864277 6.864277
## [33] 6.864277 6.864277 6.864277 6.864277
# Trasforming the trend vector in a time series.
MARWS_log_trend_ts <- ts(MARWS_log_trend_vec, start=c(1980,1), frequency=12)
str(MARWS_log_trend_ts)
##  Time-Series [1:168] from 1980 to 1994: 6.83 6.83 6.83 6.83 6.83 ...
window(MARWS_log_trend_ts, end=c(1982,12))    # Showing the first two years.
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1980 6.833516 6.833516 6.833516 6.833516 6.833516 6.833516 6.833516 6.833516
## 1981 6.954974 6.954974 6.954974 6.954974 6.954974 6.954974 6.954974 6.954974
## 1982 6.864277 6.864277 6.864277 6.864277 6.864277 6.864277 6.864277 6.864277
##           Sep      Oct      Nov      Dec
## 1980 6.833516 6.833516 6.833516 6.833516
## 1981 6.954974 6.954974 6.954974 6.954974
## 1982 6.864277 6.864277 6.864277 6.864277
window(MARWS_log_trend_ts, start=c(1993,01))  # Showing the last two years.
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1993 7.636275 7.636275 7.636275 7.636275 7.636275 7.636275 7.636275 7.636275
##           Sep      Oct      Nov      Dec
## 1993 7.636275 7.636275 7.636275 7.636275

The time series MARWS_log_trend_ts is the trend component of the MARWS_log_ts time series. We consider the draft plot

Draft plot of the MARWS_log_trend_ts time series.

# Draft plot of the trend time series
plot(MARWS_log_trend_ts, type="l", col="blue", xlab="date", ylab="log kilotitres", main="AU Red Wine Monthly Sales - Annnual Average Component (log kliters) from 01-1980 to 07-1995")

We build the the detrended MARWS_log_det_ts time series.

# Removing the trend.
MARWS_log_det_ts <- MARWS_log_ts-MARWS_log_trend_ts                       

Draft plot of the MARWS_log_det_ts time series.

# Draft plot of the detrended time series
plot(MARWS_log_det_ts, col="blue", xlab="date", ylab="log kilotitres", main="AU Red Wine Monthly Sales - Detrended by Annual Average Component (log kliters) from 01-1980 to 07-1995")

Now, we consider the deseasonalization of the detrended MARWS_log_det_ts time series.

# Writing the detrended time series in a matrix form.
MARWS_log_det_mat <- t(matrix(as.vector(MARWS_log_det_ts), nrow=12))
# Showing the first 6 rows of the augmented vector in matrix form.
show(MARWS_log_det_mat[1:6,1:12]) 
##            [,1]       [,2]       [,3]         [,4]       [,5]        [,6]
## [1,] -0.6936317 -0.3188036 -0.2781594 -0.045671300 0.20438968 0.148418395
## [2,] -0.6820972 -0.1716490 -0.1592684 -0.003202015 0.13426898 0.205095028
## [3,] -0.5653279 -0.4106521 -0.1746779  0.023275439 0.06131806 0.105513537
## [4,] -0.5649877 -0.4045849 -0.2627776 -0.102123356 0.16016217 0.283702878
## [5,] -0.5917312 -0.4199563 -0.2376347 -0.116732950 0.14343893 0.005390199
## [6,] -0.5035138 -0.2945620 -0.1396951 -0.105077881 0.13892542 0.122536987
##            [,7]      [,8]        [,9]       [,10]       [,11]      [,12]
## [1,] 0.35035443 0.3053507  0.18756768  0.03653713  0.07023098 0.03341700
## [2,] 0.40066692 0.4083054  0.02603156 -0.13277679 -0.03925073 0.01387620
## [3,] 0.38280345 0.2950148  0.14241809  0.04247765  0.03947012 0.05836676
## [4,] 0.33985561 0.4564684  0.06177640 -0.14713357  0.06870284 0.11093884
## [5,] 0.32770190 0.5808528 -0.04466060 -0.01770920  0.00144542 0.36959577
## [6,] 0.02907572 0.4422517  0.01371893  0.13762419  0.13109249 0.02762329
# Monthly (seasonal) averaging by column averages.
MARWS_log_monthly_av_vec <- colMeans(MARWS_log_det_mat, na.rm=TRUE)
# Showing the seasonal averages.
show(MARWS_log_monthly_av_vec)
##  [1] -0.622888781 -0.335061565 -0.158849809 -0.057055868  0.095716626
##  [6]  0.116540343  0.327383710  0.324762494  0.083967480 -0.006473378
## [11]  0.083863876  0.148094871
# Writing the monthly averages in vector form.
MARWS_log_seas_vec <- rep(MARWS_log_monthly_av_vec, times=nrow(MARWS_log_mat))
# Showing the first 36 monthly averages.
show(MARWS_log_seas_vec[1:36])                                               
##  [1] -0.622888781 -0.335061565 -0.158849809 -0.057055868  0.095716626
##  [6]  0.116540343  0.327383710  0.324762494  0.083967480 -0.006473378
## [11]  0.083863876  0.148094871 -0.622888781 -0.335061565 -0.158849809
## [16] -0.057055868  0.095716626  0.116540343  0.327383710  0.324762494
## [21]  0.083967480 -0.006473378  0.083863876  0.148094871 -0.622888781
## [26] -0.335061565 -0.158849809 -0.057055868  0.095716626  0.116540343
## [31]  0.327383710  0.324762494  0.083967480 -0.006473378  0.083863876
## [36]  0.148094871
# Transforming the monthly average vector in a time series.
MARWS_log_seas_ts <- ts(MARWS_log_seas_vec, start=c(1980,1), frequency=12)
# Showing the first 24 entries of the time series.
head(MARWS_log_seas_ts,24)
##               Jan          Feb          Mar          Apr          May
## 1980 -0.622888781 -0.335061565 -0.158849809 -0.057055868  0.095716626
## 1981 -0.622888781 -0.335061565 -0.158849809 -0.057055868  0.095716626
##               Jun          Jul          Aug          Sep          Oct
## 1980  0.116540343  0.327383710  0.324762494  0.083967480 -0.006473378
## 1981  0.116540343  0.327383710  0.324762494  0.083967480 -0.006473378
##               Nov          Dec
## 1980  0.083863876  0.148094871
## 1981  0.083863876  0.148094871
# Showing the last 24 entries of the time series.
tail(MARWS_log_seas_ts,24)
##               Jan          Feb          Mar          Apr          May
## 1992 -0.622888781 -0.335061565 -0.158849809 -0.057055868  0.095716626
## 1993 -0.622888781 -0.335061565 -0.158849809 -0.057055868  0.095716626
##               Jun          Jul          Aug          Sep          Oct
## 1992  0.116540343  0.327383710  0.324762494  0.083967480 -0.006473378
## 1993  0.116540343  0.327383710  0.324762494  0.083967480 -0.006473378
##               Nov          Dec
## 1992  0.083863876  0.148094871
## 1993  0.083863876  0.148094871

The time series MARWS_log_seas_ts is the seasonal component of the MARWS_log_ts time series. We consider the draft plot.

plot(MARWS_log_seas_ts, col="blue", xlab="date", ylab="log kilotitres", main="AU Red Wine Monthly Sales - Monthly Seasonal Component (log kliters) from 01-1980 to 07-1995")

Building the deseasonalized MARWS_log time series.

MARWS_log_deseas_ts <- MARWS_log_ts-MARWS_log_seas_ts

Draft plot of the deseasonalized of the MARWS_log time series.

plot(MARWS_log_deseas_ts, xlab="time", ylab="", main="AU Red Wine Monthly Sales - Deasonalized by Monthly Seasonal Component (log kliters) from 01-1980 to 07-1995")

Building the detrended and deseasonalized MARWS_log time series, that is the residual component of the times series

MARWS_log_res_ts <- MARWS_log_ts - MARWS_log_trend_ts - MARWS_log_seas_ts

Draft plot of the residual component of the MARWS_log time series.

plot(MARWS_log_res_ts, xlab="time", ylab="", main="AU Red Wine Monthly Sales - Noise Component By Detrending and Deseasonalizing (log kliters) from 01-1980 to 07-1995")

We store the results of our decomposition procedure in a data frame.

MARWS_log_dec_df <- data.frame(t=MARWS_train_df$t, Year=MARWS_train_df$Year, Month=MARWS_train_df$Month, y=MARWS_train_df$MARWS_log, y_trend=as.vector(MARWS_log_trend_ts), y_seas=as.vector(MARWS_log_seas_ts), 
y_res=as.vector(MARWS_log_res_ts), y_det=as.vector(MARWS_log_det_ts), y_deseas=as.vector(MARWS_log_deseas_ts))

Now, we need to analyze the residual component of the MARWS_log time series. But before proceeding with our analysis we think it is worth showing that very similar results can be easily obtained by the application of the R function stl, which generates a seasonal-trend-residual decomposition of a time series with seasonal component.

# library(xts)
MARWS_log_stl <- stl(MARWS_log_ts, s.window="periodic")
class(MARWS_log_stl)
## [1] "stl"
# Showing the first 24 entries of the stl decomposition after transforming the stl object in a df object.
MARWS_train_log_df <- as.data.frame(as.xts(MARWS_log_stl$time.series))
head(MARWS_train_log_df, 24)
##             seasonal    trend    remainder
## Jan 1980 -0.59427976 6.777918 -0.043753227
## Feb 1980 -0.31235460 6.789941  0.037126315
## Mar 1980 -0.14204491 6.801964 -0.104562597
## Apr 1980 -0.04531184 6.813602  0.019554474
## May 1980  0.10239981 6.825240  0.110265860
## Jun 1980  0.11866702 6.837151  0.026116429
## Jul 1980  0.32495385 6.849062  0.009854683
## Aug 1980  0.31783330 6.861554 -0.040520039
## Sep 1980  0.07253890 6.874045  0.074499765
## Oct 1980 -0.02347288 6.886389  0.007137168
## Nov 1980  0.06129343 6.898733 -0.056279117
## Dec 1980  0.11977770 6.911771 -0.164615297
## Jan 1981 -0.59427976 6.924809 -0.057652056
## Feb 1981 -0.31235460 6.936586  0.159093938
## Mar 1981 -0.14204491 6.948363 -0.010612204
## Apr 1981 -0.04531184 6.953573  0.043510582
## May 1981  0.10239981 6.958784  0.028059393
## Jun 1981  0.11866702 6.956441  0.084961033
## Jul 1981  0.32495385 6.954098  0.076588877
## Aug 1981  0.31783330 6.942531  0.102914959
## Sep 1981  0.07253890 6.930964 -0.022497443
## Oct 1981 -0.02347288 6.916448 -0.070777448
## Nov 1981  0.06129343 6.901931 -0.047501130
## Dec 1981  0.11977770 6.889033 -0.039960531

We show a draft of the plots of the components of the stl decomposition

plot(MARWS_log_stl)

We compare draft plots of the components of both the decompositions.

The seasonal components.

par(mfrow=c(2,1))
plot(MARWS_log_seas_ts, type="l", main="AU Red Wine Monthly Sales - Seasonal Component (log kliters) from 01-1980 to 07-1995")
plot(MARWS_log_stl$time.series[,"seasonal"], type="l", main="AU Red Wine Monthly Sales - stl Seasonal Component (log kliters) from 01-1980 to 07-1995")

The trend components.

par(mfrow=c(2,1))
plot(MARWS_log_trend_ts, type="l", main="AU Red Wine Monthly Sales - Trend Component (log kliters) from 01-1980 to 07-1995")
plot(MARWS_log_stl$time.series[,"trend"], type="l", main="AU Red Wine Monthly Sales - stl Trend Component (log kliters) from 01-1980 to 07-1995")

The residual components.

par(mfrow=c(2,1))
plot(MARWS_log_res_ts, type="l", main="AU Red Wine Monthly Sales - Noise Component (log kliters) from 01-1980 to 07-1995")
plot(MARWS_log_stl$time.series[,"remainder"], type="l", main="AU Red Wine Monthly Sales - stl Noise Component (log kliters) from 01-1980 to 07-1995")

We start our analysis of the residual component of the MARWS_log time series, by drawing a more detailed plot of the noises vs indices line plot.

# The line plot
Data_df <- MARWS_log_dec_df
length <- nrow(Data_df)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Line Plot of the Noise Component of the AU Red Wine Monthly Monthly Logarithm Sales from ", .(First_Date), " to ", .(Last_Date))))
subtitle_content <- bquote(paste("path length ", .(length), " sample points"))
caption_content <- "Author: Roberto Monte"
x_name <- bquote("")
y_name <- bquote("residuals (log kliters)")
x_breaks_num <- 30
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((max(x_breaks)-x_breaks_up)>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- paste(Data_df$Month[x_breaks],Data_df$Year[x_breaks])
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth, x_breaks_up+J*x_binwidth)
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$y_res)-min(Data_df$y_res))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$y_res)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$y_res)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <-  1
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("Log Red Wine Monthly Sales - Noise Comp.")
col_2 <- bquote("LOESS curve")
col_3 <- bquote("regression line")
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="blue", "col_2"="red", "col_3"="green")
leg_breaks <- c("col_1", "col_2", "col_3")
MARWS_log_res_lp <- ggplot(Data_df) +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=y_res, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=y_res, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_line(alpha=1, size=0.8, linetype="solid", aes(x=t, y=y_res, color="col_1", group=1)) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks,
                      guide=guide_legend(override.aes=list(linetype=c("solid", "dashed", "solid")))) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=-45, vjust=1, hjust=-0.3),
        legend.key.width = unit(1.0,"cm"), legend.position="bottom")
plot(MARWS_log_res_lp)

We also consider the summary statistic of the residual component,

y <- MARWS_log_dec_df$y_res
summary(y)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -0.298308 -0.058967 -0.004919  0.000000  0.058038  0.256090

with standard deviation.

MARWS_log_dec_res <- MARWS_log_dec_df$y_res
sd(MARWS_log_dec_res)
## [1] 0.09353815

The five summary points of the residual component of the MARWS_log time series show more symmetry and fit better the corresponding summary points of the zero centered Gaussian distribution with standard deviation \(sd(MARWS\_log\_dec\_res)\). In fact, for a such a Gaussian distribution we have the following theoretical values (to be compared to the computed values in the last row): \[\begin{equation} \begin{array}{ccccc} Min\,(99.73\%) & 1Q & Median & 3Q & Max\,(99.73\%)\\ -0.287120 & -0.064553 & 0.00 & 0.064553 & 0.287120\\ -0.317325 & -0.065583 & -0.005281 & 0.059576 & 0.268123 \end{array} \end{equation}\] where \[\begin{equation} \begin{array}{c} Min\,(99.73\%)= Mean-3*sd(MARWS\_log\_dec\_res), \qquad Max\,(99.73\%)= Mean+3*sd(MARWS\_log\_dec\_res),\\ 1Q = qnorm(0.25, mean = 0, sd = sd(MARWS\_log\_dec\_res), lower.tail = TRUE),\\ 3Q = qnorm(0.75, mean = 0, sd = sd(MARWS\_log\_dec\_res), lower.tail = TRUE).\\ \end{array} \end{equation}\]

In addition, given the first and third quartiles \(1Q\) and \(3Q\), respectively, we know that the quartile estimate of the mean and standard deviation of the corresponding Gaussian are given by \[\begin{equation} \hat{\mu} =(1Q+3Q)/2=-0.000464 \quad\text{and}\quad \hat{\sigma}=(3Q-1Q)/2*0.67448=0.086738. \tag{3.39} \end{equation}\]

In fact, replacing the values for \(1Q\) and \(3Q\) drawn from the summary

MARWS_log_dec_res_summ <- summary(MARWS_log_dec_res)
show(MARWS_log_dec_res_summ)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -0.298308 -0.058967 -0.004919  0.000000  0.058038  0.256090

we obtain

quart_mean <- as.vector(MARWS_log_dec_res_summ[2]+MARWS_log_dec_res_summ[5])/2
show(round(quart_mean,6))
## [1] -0.000464
quart_sd <- round(as.vector(MARWS_log_dec_res_summ[5]-MARWS_log_dec_res_summ[2])/(2*0.67448),6)
show(round(c(quart_sd,sd(MARWS_log_dec_res)),6))
## [1] 0.086738 0.093538

As a consequence, if the residuals of the linear model MARWS_log_lm were generated by independent sampling from a Gaussian distribution, then the quartile-estimates of the mean and standard deviation of such a Gaussian distribution would take values rather close to zero and RSE, respectively. To to quantify how close, we consider the confidence intervals of the mean and the standard deviation of the residuals under the assumption that they are generated by independent sampling from a Gaussian distribution.

MARWS_log_dec_res_t_test <- t.test(x=MARWS_log_dec_res, alternative = "two.sided", mu=0, conf.level=0.95)
show(c(round(MARWS_log_dec_res_t_test$estimate, digits=6),round(MARWS_log_dec_res_t_test$conf.int, digits=6),round(MARWS_log_dec_res_t_test$p.value, digits=6)))
## mean of x                               
##  0.000000 -0.014248  0.014248  1.000000
MARWS_log_dec_res_chisq_test <- EnvStats::varTest(x=MARWS_log_dec_res, alternative="two.sided", sigma.squared=var(MARWS_log_dec_res), conf.level=0.95)
show(c(round(MARWS_log_dec_res_chisq_test$estimate, digits=6),round(MARWS_log_dec_res_chisq_test$conf.int, digits=6),round(MARWS_log_dec_res_chisq_test$p.value, digits=6)))
## variance      LCL      UCL          
## 0.008749 0.007139 0.010977 0.970893

The quartile estimate of the mean \(\hat{\mu}=-0.000464\), is in the \(95\%\) confidence interval \([-0.014248, 0.014248]\) of the zero residual mean and the corresponding quartile estimate of the variance \(\hat{\sigma}^{2}=0.007523481\) is in the \(95\%\) confidence interval \([0.007139, 0.010977]\) of the \(var(MARWS\_log\_dec\_res)=0.008749386\) residual variance. This means that, in light of the values taken by the empirical quartiles of the residuals of the decomposition of MARWS_log, we cannot reject the assumption that the residuals have been generated from a random sampling by a zero centered Gaussian distribution with standard deviation \(sd(MARWS\_log\_dec\_res)=0.09353815\).

For completeness, we also compute the skewness, and the kurtosis jointly with the \(95\%\) confidence intervals derived under the assumption of Gaussian distributed data set.

Skew_Gauss <- DescTools::Skew(x=MARWS_log_dec_res, weights = NULL, na.rm = TRUE, method = 2, conf.level = 0.95, ci.type = "classic")
show(Skew_Gauss)
##    skewness      lwr.ci      upr.ci 
## -0.06855094 -0.36714798  0.36714798
Kurt_Gauss <- DescTools::Kurt(x=MARWS_log_dec_res, weights = NULL, na.rm = TRUE, method = 2, conf.level = 0.95, ci.type = "classic")
show(Kurt_Gauss)
##   kurtosis     lwr.ci     upr.ci 
##  0.2189116 -0.7301426  0.7301426

Thus, the residuals of the linear model MARWS_log appear to be unskewed and platykurtic at the \(95\%\) confidence level.

The basic analysis of the residuals of the decomposition of time series MARWS_log provides no evidences to reject the assumption that the residuals are generated by independent sampling from the zero centered Gaussian distribution with standard deviation \(sd(MARWS\_log\_dec\_res)=0.095707\).

From the plot of the residual component of the MARWS_log time series, we have a rather clear evidence for stationariy and and also evidence for homoskedasticity. These evidences have to be confirmed computationally, by meand of the stationarity and homoskedasticity tests that we have already introduced.

We apply the DF test the residual component of the MARWS_log time series.

# library(urca)                     # The library for this veRsion of the test.
y <- MARWS_log_dec_df$y_res         # Choosing the data set to be tested.
no_lags <- 0                        # Setting the lag parameter for the test.

Res_DF_none <- ur.df(y, type="none", lags=no_lags, selectlags="Fixed")    
# Applying the form of the DF test which considers the null hypothesis that the data set is generated by 
# a process with a random walk component, while the alternative hypothesis is that the data set is generated 
# by an autoregressive process with no drift and trend.
summary(Res_DF_none) # Showing the result of the test
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.298335 -0.058502 -0.004183  0.058446  0.256089 
## 
## Coefficients:
##         Estimate Std. Error t value Pr(>|t|)    
## z.lag.1 -0.99548    0.07756  -12.84   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09366 on 166 degrees of freedom
## Multiple R-squared:  0.4981, Adjusted R-squared:  0.4951 
## F-statistic: 164.7 on 1 and 166 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -12.8354 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

The test statistics of the DF test takes value inside the rejection region at the significance level \(\alpha=0.01\) or \(\alpha=1\%\). Therefore we can reject the unit root null hypothesis in favor of the mean stationary alternative.

We apply the KPSS test.

# library(urca)                # The library for this vesion of the test
y <- MARWS_log_dec_df$y_res    # Choosing the data set to be tested

Res_KPSS_mu <- ur.kpss(y, type="mu", lags="nil", use.lag=NULL)    
# Applying the simplest form of the KPSS test which considers the hull hypothesis that
# the data set is generated by an autoregressive process with constant mean,
# while the alternative hypothesis is that the data set is generated a process with a random walk component.

summary(Res_KPSS_mu)    # Showing the result of the test
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 0 lags. 
## 
## Value of test-statistic is: 0.016 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

We cannot reject the null hypothesis of stationarity at the significance level \(\alpha=0.1\) or \(\alpha=10\%\).

In light of the the visual evidences and the combined results of the DF and KPSS test, we cannot reject the hypothesis that the residual component of the MARWS_log time series is generated by a stationary noise.

Thereafter, we consider the BP and W homoskedasticity test. We refer to the MARWS_log_dec_df data frame and we apply the test to the following data frame

Data_df <- data.frame(x=MARWS_log_dec_df$t,y=MARWS_log_dec_df$y_res)

Since we have estimated no kurtosis excess for the residuals, we apply the unstudentized Breusch-Pagan test.

# Studentized Breusch-Pagan test
lmtest::bptest(formula = y~x, varformula = NULL, studentize = FALSE, data=Data_df)
## 
##  Breusch-Pagan test
## 
## data:  y ~ x
## BP = 1.9974, df = 1, p-value = 0.1576

The White test

# library(lmtest)
var.formula <- ~ x+I(x^2)
lmtest::bptest(formula = y ~ x, varformula = var.formula, studentize = TRUE, data=Data_df)
## 
##  studentized Breusch-Pagan test
## 
## data:  y ~ x
## BP = 1.84, df = 2, p-value = 0.3985

In light of the results of the BP and W tests, we cannot reject the homoskedasticity null at the \(10\%\) significance level.

As a consequence of our stationarity and homoskedasticity tests, we cannot reject the null hypothesis that the residual component of the MARWS_log time series is generated by a stationary and homoskedastic noise.

To complete this promising results on the residual component we need to check the lack of correlation.

Also in this case we consider first a visual approach and then a computational approach.

Plot of the autocorrelogram of the residual component of the MARWS_log time series.

Data_df <- MARWS_log_dec_df
y <- Data_df$y_res
length <- length(y)
T <- length
# maxlag <- ceiling(10*log10(T))      # Default
# maxlag <- ceiling(sqrt(n)+45)       # Box-Jenkins
maxlag <- ceiling(min(10, T/4))       # Hyndman (for data without seasonality)
# maxlag <- ceiling(min(2*12, T/5))   # Hyndman https://robjhyndman.com/hyndsight/ljung-box-test/
Aut_Fun_y <- acf(y, lag.max = maxlag, type="correlation", plot=FALSE)
ci_90 <- qnorm((1+0.90)/2)/sqrt(length)
ci_95 <- qnorm((1+0.95)/2)/sqrt(length)
ci_99 <- qnorm((1+0.99)/2)/sqrt(length)
Plot_Aut_Fun_y <- data.frame(lag=Aut_Fun_y$lag, acf=Aut_Fun_y$acf)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Plot of the Autocorrelogram of the Residual Component of the AU Red Wine Monthly Logarithm Sales from ", .(First_Date), " to ", .(Last_Date))))
subtitle_content <- bquote(paste("path length ", .(length), " sample points,   ", "lags ", .(maxlag)))
caption_content <- "Author: Roberto Monte"
ggplot(Plot_Aut_Fun_y, aes(x=lag, y=acf)) + 
  geom_segment(aes(x=lag, y=rep(0,length(lag)), xend=lag, yend=acf), size = 1, col="black") +
  # geom_col(mapping=NULL, data=NULL, position="dodge", width = 0.1, col="black", inherit.aes = TRUE)+
  geom_hline(aes(yintercept=-ci_90, color="CI_90"), show.legend = TRUE, lty=3) +
  geom_hline(aes(yintercept=ci_90, color="CI_90"), lty=3) +
  geom_hline(aes(yintercept=ci_95, color="CI_95"), show.legend = TRUE, lty=4) + 
  geom_hline(aes(yintercept=-ci_95, color="CI_95"), lty=4) +
  geom_hline(aes(yintercept=-ci_99, color="CI_99"), show.legend = TRUE, lty=4) +
  geom_hline(aes(yintercept=ci_99, color="CI_99"), lty=4) +
  scale_x_continuous(name="lag", breaks=waiver(), label=waiver()) +
  scale_y_continuous(name="acf value", breaks=waiver(), labels=NULL,
                     sec.axis = sec_axis(~., breaks=waiver(), labels=waiver())) +
  scale_color_manual(name="Conf. Inter.", labels=c("90%","95%","99%"),
                     values=c(CI_90="red", CI_95="blue", CI_99="green")) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")

The number of peaks corresponding to non-zero lags crossing the confidence lines of the autocorrelogram is within the statistical tolerance. In fact, we have no peaks crossing both the \(99\%\) and \(95\%\) confidence lines (the strict tolerance is floor(maxlag \(\ast 0.01\))=floor(\(10 \ast 0.01\))=\(0\) and floor(maxlag \(\ast 0.05\))=floor(\(10 \ast 0.05\))=\(0\) and one peak crossing the \(90\%\) confidence lines (the strict tolerance is floor(maxlag \(\ast 0.10\))=floor(\(10 \ast 0.10\))=\(1\)).

Plot of the partial autocorrelogram.

Data_df <- MARWS_log_dec_df
y <- Data_df$y_res
length <- length(y)
T <- length
# maxlag <- ceiling(10*log10(T))      # Default
# maxlag <- ceiling(sqrt(n)+45)       # Box-Jenkins
maxlag <- ceiling(min(10, T/4))       # Hyndman (for data without seasonality)
# maxlag <- ceiling(min(2*12, T/5))   # Hyndman https://robjhyndman.com/hyndsight/ljung-box-test/
P_Aut_Fun_y <- pacf(y, lag.max = maxlag, plot=FALSE)
ci_90 <- qnorm((1+0.90)/2)/sqrt(length)
ci_95 <- qnorm((1+0.95)/2)/sqrt(length)
ci_99 <- qnorm((1+0.99)/2)/sqrt(length)
Plot_P_Aut_Fun_y <- data.frame(lag=P_Aut_Fun_y$lag, pacf=P_Aut_Fun_y$acf)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Plot of the Partial Autocorrelogram of the Residual Component of the AU Red Wine Monthly Logarithm Sales from ", .(First_Date), " to ", .(Last_Date))))
subtitle_content <- bquote(paste("path length ", .(length), " sample points,   ", "lags ", .(maxlag)))
caption_content <- "Author: Roberto Monte"
ggplot(Plot_P_Aut_Fun_y, aes(x=lag, y=pacf)) + 
  geom_segment(aes(x=lag, y=rep(0,length(lag)), xend=lag, yend=pacf), size = 1, col="black") +
  # geom_col(mapping=NULL, data=NULL, position="dodge", width = 0.1, col="black", inherit.aes = TRUE)+
  geom_hline(aes(yintercept=-ci_90, color="CI_90"), show.legend = TRUE, lty=3) +
  geom_hline(aes(yintercept=ci_90, color="CI_90"), lty=3) +
  geom_hline(aes(yintercept=ci_95, color="CI_95"), show.legend = TRUE, lty=4) + 
  geom_hline(aes(yintercept=-ci_95, color="CI_95"), lty=4) +
  geom_hline(aes(yintercept=-ci_99, color="CI_99"), show.legend = TRUE, lty=4) +
  geom_hline(aes(yintercept=ci_99, color="CI_99"), lty=4) +
  scale_x_continuous(name="lag", breaks=waiver(), label=waiver()) +
  scale_y_continuous(name="pacf value", breaks=waiver(), labels=NULL,
                     sec.axis = sec_axis(~., breaks=waiver(), labels=waiver())) +
  scale_color_manual(name="Conf. Inter.", labels=c("90%","95%","99%"),
                     values=c(CI_90="red", CI_95="blue", CI_99="green")) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")

Also the number of peaks crossing the confidence lines of the partial autocorrelogram is almost within the wide statistical tolerance. In fact, we still have no peaks crossing the \(99\%\) confidence line (the wide tolerance is ceiling(maxlag \(\ast 0.01\))=ceiling(\(10 \ast 0.01\))=\(1\)) and one peak crossing the \(95\%\) confidence line (the wide tolerance is ceiling(maxlag \(\ast 0.05\))=floor(\(10 \ast 0.01\))=\(1\) and two peaks crossing the \(90\%\) confidence lines (the wide tolerance is floor(maxlag \(\ast 0.10\))=floor(\(10 \ast 0.10\))=\(1\)).

We apply the Ljung-Box (LB) test

y <- MARWS_log_dec_df$y_res
T <- length(y)
maxlag <- ceiling(min(10, T/4))    # Hyndman (for data without seasonality)
y_LB <- LjungBoxTest(y, k=1, lag.max=maxlag, StartLag=1, SquaredQ=FALSE)
show(y_LB)
##   m    Qm    pvalue
##   1  0.00 0.9530066
##   2  1.49 0.2217523
##   3  2.31 0.3146092
##   4  2.39 0.4956696
##   5  2.42 0.6586917
##   6  5.14 0.3991313
##   7  8.88 0.1801832
##   8  9.97 0.1902466
##   9 10.01 0.2644402
##  10 10.14 0.3390949
plot(y_LB[,3], main="Ljung-Box Q Test", ylab="P-values", xlab="Lag")

The null hypothesis that the residuals have been generated by independent and identically distributed noise cannot be rejected at the \(10\%\) significance level.

In light of the evidences from the autocorrelograms and the Ljung-Box test we cannot reject the null hypothesis that the residual component of the MARWS_log time series has been generated by an independent and identically distributed noise.

Therefore, we consider the histogram of the residual component.

Standard statistics on the residual component of the MARWS_log time series.

# Statistics of the Res data set
Res <- MARWS_log_dec_df$y_res
mu <- 0.00
sigma <- sd(Res)
Samp_Data <- Res
Statistics=c("mean", "median", "mode", "min. (99.73%)", "max. (99.73%)", "1st quart.", "3rd quart.", "st. dev.", "skew.", "ex. kurt.")

Teor_Stats <- rep(0,10)
Teor_Stats[1] <- as.character(formatC(mu, digits=3, format="f"))
Teor_Stats[2] <- as.character(formatC(qnorm(0.50, mean=mu, sd=sigma, lower.tail=TRUE, log.p=FALSE), digits=3, format="f"))
Teor_Stats[3] <- as.character(formatC(mu, digits=3, format="f"))
Teor_Stats[4] <- as.character(formatC(qnorm(0.00135, mean=mu, sd=sigma, lower.tail=TRUE, log.p=FALSE), digits=3, format="f"))
Teor_Stats[5] <- as.character(formatC(qnorm(0.99865, mean=mu, sd=sigma, lower.tail=TRUE, log.p=FALSE), digits=3, format="f"))
Teor_Stats[6] <- as.character(formatC(qnorm(0.25, mean=mu, sd=sigma, lower.tail=TRUE, log.p=FALSE), digits=3, format="f"))
Teor_Stats[7] <- as.character(formatC(qnorm(0.75, mean=mu, sd=sigma, lower.tail=TRUE, log.p=FALSE), digits=3, format="f"))
Teor_Stats[8] <- as.character(formatC(sigma, digits=3, format="f"))
Teor_Stats[9] <- as.character(formatC(0, digits=3, format="f"))
Teor_Stats[10] <- as.character(formatC(0, digits=3, format="f"))

Samp_Stats <- rep(0,10)
Samp_Stats[1] <- as.character(formatC(mean(Samp_Data), digits=3, format="f"))
Samp_Stats[2] <- as.character(formatC(median(Samp_Data), digits=3, format="f"))
Samp_Stats[3] <- as.character(formatC(mode(Samp_Data), digits=3, format="f"))
Samp_Stats[4] <- as.character(formatC(min(Samp_Data), digits=3, format="f"))
Samp_Stats[5] <- as.character(formatC(max(Samp_Data), digits=3, format="f"))
Samp_Stats[6] <- as.character(formatC(quantile(Samp_Data,0.25), digits=3, format="f"))
Samp_Stats[7] <- as.character(formatC(quantile(Samp_Data,0.75), digits=3, format="f"))
Samp_Stats[8] <- as.character(formatC(sd(Samp_Data), digits=3, format="f"))
Samp_Stats[9] <- as.character(formatC(as.numeric(timeDate::skewness(Samp_Data, method="moment")), digits=3, format="f"))
Samp_Stats[10] <- as.character(formatC(as.numeric(timeDate::kurtosis(Samp_Data, method="excess")), digits=3, format="f"))

Table_Stats <- data.frame(Samp_Stats,Teor_Stats)
rownames(Table_Stats) <- Statistics
colnames(Table_Stats) <- c("Samp. Stats", "Teor. Stats")

Relative frequency and density histograms of the residuals.

The relative frequency histogram.

# library(gridExtra)
#### Relative Frequency Histogram + Sample Statistics 
Data_df <- MARWS_log_dec_df
length <- nrow(Data_df)
mu <- 0.00
sigma <- round(sd(MARWS_log_dec_df$y_res), digits=3)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Relative Frequency Histogram of the Residual Component of the AU Red Wine Monthly Logarithm Sales from", .(First_Date), " to ", .(Last_Date))))
subtitle_content <- bquote(paste("Data set size=", .(length), " points;    Theoretical Gaussian Distribution Parameters ", mu, " = ", .(mu), ", ", sigma, " = ", .(sigma)))
caption_content <- "Author: Roberto Monte"
# x_breaks_num <- ceiling(length^(1/2)) # Tukey & Mosteller square-root rule
# x_breaks_num <- ceiling(1+log2(length)) # Sturges rule
# x_breaks_num <- ceiling((2*length)^(1/3)) # Teller & Scott rice rule
x_breaks_num <- 10
x_binwidth <- round((max(Data_df$y_res)-min(Data_df$y_res))/x_breaks_num, digits=1)
# x_binwidth <- 0.5
x_breaks_low <- floor((min(Data_df$y_res)/x_binwidth))*x_binwidth
x_breaks_up <- ceiling((max(Data_df$y_res)/x_binwidth))*x_binwidth
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth,x_breaks_up-J*x_binwidth)
y_breaks <- seq(from=0, to=0.45, by=0.05)
y_labs <- format(percent(y_breaks), scientific=FALSE)
y_lims <- c(-0.010, 0.55)
tt3 <- ttheme_minimal(core=list(fg_params=list(hjust=1, x=0.90)),
                      rowhead=list(fg_params=list(hjust=0, x=0)))
Table_Stats_Grob <- tableGrob(Table_Stats, theme=tt3)
MARWS_log_res_rel_freq_hist <- ggplot(Data_df, aes(x=y_res)) +
  geom_histogram(binwidth=x_binwidth , aes(y=stat(count)/sum(count)), color="black", fill="green", alpha=0.5) +
  scale_x_continuous(name="Sample Data", breaks=x_breaks, labels=x_labs, limits=x_lims) +
  scale_y_continuous(name="Data Relative Frequency", breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis=sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust=0.5), 
        plot.subtitle=element_text(hjust=0.5),
        plot.caption=element_text(hjust=1.0)) +
  geom_vline(aes(xintercept=as.numeric(qnorm(0.25, mean=mu, sd=sigma, lower.tail=TRUE, log.p=FALSE))), 
             colour="red", linetype="dotdash", size=0.5) +
  annotate("text", x=as.numeric(qnorm(0.25, mean=mu, sd=sigma, lower.tail=TRUE, log.p=FALSE))
           -0.025, y=0.01, colour="red", 
           label=Teor_Stats[6], hjust=0) +
  geom_vline(aes(xintercept=as.numeric(qnorm(0.75, mean=mu, sd=sigma, lower.tail=TRUE, log.p=FALSE))), 
             colour="red", linetype="dotdash", size=0.5) +
  annotate("text", x=as.numeric(qnorm(0.75, mean=mu, sd=sigma, lower.tail=TRUE, log.p=FALSE))
           +0.005, y=0.01, colour="red", label=Teor_Stats[7], hjust=0) +
  geom_vline(aes(xintercept=as.numeric(quantile(Samp_Data,0.25))), 
             colour="blue", linetype="dotdash", size=0.5) +
  annotate("text", x=as.numeric(quantile(Samp_Data,0.25))+0.005, y=-0.01, colour="blue", 
           label=Samp_Stats[6], hjust=0) +
  geom_vline(aes(xintercept=as.numeric(quantile(Samp_Data,0.75))), 
             colour="blue", linetype="dotdash", size=0.5) +
  annotate("text", x=as.numeric(quantile(Samp_Data,0.75))-0.025, y=-0.01, colour="blue", 
           label=Samp_Stats[7], hjust=0) +
  geom_vline(aes(xintercept=mean(Samp_Data)), colour="blue", linetype="longdash", size=0.5) +
  annotate("text", x=mean(Samp_Data)+0.005, y=-0.01, colour="blue", 
           label=Samp_Stats[1], hjust=0) +
  geom_vline(aes(xintercept=median(Samp_Data)), colour="blue", linetype="dashed", size=0.5) +
  annotate("text", x=median(Samp_Data)-0.025, y=-0.01, colour="blue", 
           label=Samp_Stats[2], hjust=0) +
  annotate("rect", xmin=x_lims[1], xmax=x_lims[1]+0.21, ymin=(y_lims[2]-0.36), ymax=y_lims[2], 
           colour="green", fill="white") +
  annotation_custom(Table_Stats_Grob, xmin=(x_lims[1]+0.07), xmax=(x_lims[1]+0.15), 
                    ymin=(y_lims[2]-0.25), ymax=(y_lims[2]-0.10)) +
  annotate("rect", xmin=x_lims[1], xmax=x_lims[1]+0.21, ymin=(y_lims[2]-0.36), ymax=y_lims[2], 
           colour="green", fill="white") +
  annotation_custom(Table_Stats_Grob, xmin=(x_lims[1]+0.07), xmax=(x_lims[1]+0.15), 
                    ymin=(y_lims[2]-0.25), ymax=(y_lims[2]-0.10)) +
  annotate("rect", xmin=(x_lims[2]-0.150), xmax=x_lims[2], ymin=(y_lims[2]-0.13), ymax=y_lims[2], 
           colour="green", fill="white") +
  annotate("segment", x=(x_lims[2]-0.145), xend=(x_lims[2]-0.120), y=(y_lims[2]-0.02), yend=(y_lims[2]-0.02), 
           colour="blue", lty="longdash") +
  annotate("text", x=(x_lims[2]-0.115), y=(y_lims[2]-0.02), colour="black", label="sample mean", hjust=0) +
  annotate("segment", x=(x_lims[2]-0.145), xend=(x_lims[2]-0.120), y=(y_lims[2]-0.05), yend=(y_lims[2]-0.05), 
           colour="blue", lty="dashed") +
  annotate("text", x=(x_lims[2]-0.115), y=(y_lims[2]-0.05), colour="black", label="sample median", hjust=0) +
  annotate("segment", x=(x_lims[2]-0.145), xend=(x_lims[2]-0.120), y=(y_lims[2]-0.08), yend=(y_lims[2]-0.08),
           colour="red", lty="dotdash") +
  annotate("text", x=(x_lims[2]-0.115), y=(y_lims[2]-0.08), colour="black", label="theoretical quantiles", hjust=0) +
  annotate("segment", x=(x_lims[2]-0.145), xend=(x_lims[2]-0.120), y=(y_lims[2]-0.11), yend=(y_lims[2]-0.11), 
           colour="blue", lty="dotdash") +
  annotate("text", x=(x_lims[2]-0.115), y=(y_lims[2]-0.11), colour="black", label="sample quantiles", hjust=0)
plot(MARWS_log_res_rel_freq_hist)
## Warning: `stat(count)` was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 2 rows containing missing values (`geom_bar()`).

The density histogram of the residual component.

Data_df <- MARWS_log_dec_df
length <- nrow(Data_df)
mu <- 0.00
sigma <- round(sd(MARWS_log_dec_df$y_res), digits=3)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Density Histogram of the Residual Component of the AU Red Wine Monthly Logarithm Sales from", .(First_Date), " to ", .(Last_Date))))
subtitle_content <- bquote(paste("Data set size=", .(length), " points;    Theoretical Gaussian Distribution Parameters ", mu, " = ", .(mu), ", ", sigma, " = ", .(sigma)))
caption_content <- "Author: Roberto Monte"
# x_breaks_num <- ceiling(length^(1/2)) # Tukey & Mosteller square-root rule
# x_breaks_num <- ceiling(1+log2(length)) # Sturges rule
# x_breaks_num <- ceiling((2*length)^(1/3)) # Teller & Scott rice rule
x_breaks_num <- 10
x_binwidth <- round((max(Data_df$y_res)-min(Data_df$y_res))/x_breaks_num, digits=1)
# x_binwidth <- 0.5
x_breaks_low <- floor((min(Data_df$y_res)/x_binwidth))*x_binwidth
x_breaks_up <- ceiling((max(Data_df$y_res)/x_binwidth))*x_binwidth
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth,x_breaks_up-J*x_binwidth)
y_breaks <- seq(from=0, to=4.50, by=0.50)
y_labs <- format(y_breaks, scientific=FALSE)
y_lims <- c(-0.085,4.55)
tt3 <- ttheme_minimal(core=list(fg_params=list(hjust=1, x=0.90)),
                      rowhead=list(fg_params=list(hjust=0, x=0)))
Table_Stats_Grob <- tableGrob(Table_Stats, theme=tt3)
MARWS_log_res_dens_hist <- ggplot(Data_df, aes(x=y_res)) +
  geom_histogram(binwidth=x_binwidth, aes(y=..density..), # binwidth=0.5,  # Density Histogram
                 color="black", fill="green", alpha=0.5) +
  scale_x_continuous(name="Sample Data", breaks=x_breaks, labels=x_labs, limits=x_lims) +
  scale_y_continuous(name="Data Density", breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis=sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(lineheight=0.6, face="bold", hjust=0.5), 
        plot.subtitle=element_text(hjust= 0.5),
        plot.caption=element_text(hjust=1.0)) +
  stat_function(fun=dnorm, colour="red", args=list(mean=mu, sd=sigma)) +
  geom_vline(aes(xintercept=as.numeric(qnorm(0.25, mean=mu, sd=sigma, lower.tail=TRUE, log.p=FALSE))), 
             colour="red", linetype="dotdash", size=0.5) +
  annotate("text", x=as.numeric(qnorm(0.25, mean=mu, sd=sigma, lower.tail=TRUE, log.p=FALSE))
           -0.025, y=0.080, colour="red", 
           label=Teor_Stats[6], hjust=0) +
  geom_vline(aes(xintercept=as.numeric(qnorm(0.75, mean=mu, sd=sigma, lower.tail=TRUE, log.p=FALSE))), 
             colour="red", linetype="dotdash", size=0.5) +
  annotate("text", x=as.numeric(qnorm(0.75, mean=mu, sd=sigma, lower.tail=TRUE, log.p=FALSE))
           +0.005, y=0.080, colour="red", label=Teor_Stats[7], hjust=0) +
  geom_vline(aes(xintercept=as.numeric(quantile(Samp_Data,0.25))), 
             colour="blue", linetype="dotdash", size=0.5) +
  annotate("text", x=as.numeric(quantile(Samp_Data,0.25))+0.005, y=-0.080, colour="blue", 
           label=Samp_Stats[6], hjust=0) +
  geom_vline(aes(xintercept=as.numeric(quantile(Samp_Data,0.75))), 
             colour="blue", linetype="dotdash", size=0.5) +
  annotate("text", x=as.numeric(quantile(Samp_Data,0.75))-0.025, y=-0.080, colour="blue", 
           label=Samp_Stats[7], hjust=0) +
  geom_vline(aes(xintercept=mean(Samp_Data)), colour="blue", linetype="longdash", size=0.5) +
  annotate("text", x=mean(Samp_Data)+0.005, y=-0.080, colour="blue", 
           label=Samp_Stats[1], hjust=0) +
  geom_vline(aes(xintercept=median(Samp_Data)), colour="blue", linetype="dashed", size=0.5) +
  annotate("text", x=median(Samp_Data)-0.025, y=-0.080, colour="blue", 
           label=Samp_Stats[2], hjust=0) +
  geom_vline(aes(xintercept=mode(Samp_Data)), colour="blue", linetype="dotted", size=0.5) +
  annotate("text", x=mode(Samp_Data)-0.025, y=0.080, colour="blue", 
           label=Samp_Stats[3], hjust=0) +
  geom_density(alpha=.2, colour="blue") +
  annotate("rect", xmin=(x_lims[1]), xmax=x_lims[1]+0.20, ymin=(y_lims[2]-2.90), ymax=y_lims[2], 
           colour="green", fill="white") +
  annotation_custom(Table_Stats_Grob, xmin=(x_lims[1]+0.054), xmax=(x_lims[1]+0.15), 
                    ymin=(y_lims[2]-2.60), ymax=(y_lims[2]-0.35)) +
  annotate("rect", xmin=(x_lims[2]-0.155), xmax=((x_lims[2]-0.155)+0.155), ymin=(y_lims[2]-1.60), ymax=y_lims[2], 
           colour="green", fill="white") +
  annotate("segment", x=((x_lims[2]-0.155)+0.01), xend=((x_lims[2]-0.155)+0.025), y=(y_lims[2]-0.20), yend=(y_lims[2]-0.20), 
           colour="blue", lty="longdash") +
  annotate("text", x=((x_lims[2]-0.155)+0.035), y=(y_lims[2]-0.20), colour="black", label="sample mean", hjust=0) +
  annotate("segment", x=((x_lims[2]-0.155)+0.01), xend=((x_lims[2]-0.155)+0.025), y=(y_lims[2]-0.40), yend=(y_lims[2]-0.40), 
           colour="blue", lty="dashed") +
  annotate("text", x=((x_lims[2]-0.155)+0.035), y=(y_lims[2]-0.40), colour="black", label="sample median", hjust=0) +
  annotate("segment", x=((x_lims[2]-0.155)+0.01), xend=((x_lims[2]-0.155)+0.025), y=(y_lims[2]-0.60), yend=(y_lims[2]-0.60),
           colour="red", lty="dotdash") +
  annotate("text", x=((x_lims[2]-0.155)+0.035), y=(y_lims[2]-0.60), colour="black", label="theoretical quantiles", hjust=0) +
  annotate("segment", x=((x_lims[2]-0.155)+0.01), xend=((x_lims[2]-0.155)+0.025), y=(y_lims[2]-0.80), yend=(y_lims[2]-0.80), 
           colour="blue", lty="dotdash") +
  annotate("text", x=((x_lims[2]-0.155)+0.035), y=(y_lims[2]-0.80), colour="black", label="sample quantiles", hjust=0) +
  annotate("segment", x=((x_lims[2]-0.155)+0.01), xend=((x_lims[2]-0.155)+0.025), y=(y_lims[2]-1.00), yend=(y_lims[2]-1.00), colour="red") +
  annotate("text", x=((x_lims[2]-0.155)+0.035), y=(y_lims[2]-1.00), colour="black", label="theoretical Gaussian density", hjust=0) +
  annotate("segment", x=((x_lims[2]-0.155)+0.01), xend=((x_lims[2]-0.155)+0.025),  y=(y_lims[2]-1.20), yend=(y_lims[2]-1.20), colour="blue") +
  annotate("text", x=((x_lims[2]-0.155)+0.035), y=(y_lims[2]-1.20), colour="black", label="Gaussian kernel density est.", hjust=0) + 
  annotate("segment", x=((x_lims[2]-0.155)+0.01), xend=((x_lims[2]-0.155)+0.025),  y=(y_lims[2]-1.40), yend=(y_lims[2]-1.40), colour="blue", lty="dotted") +
  annotate("text", x=((x_lims[2]-0.155)+0.035), y=(y_lims[2]-1.40), colour="black", label="sample mode", hjust=0)

plot(MARWS_log_res_dens_hist)
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 2 rows containing missing values (`geom_bar()`).

In light of the relative frequency and density histograms, we have a significant visual evidence for Gaussianity in the residual component distribution. Therefore, we consider the Q-Q plot of the empirical quantiles associated to the residual component against the corresponding set of quantiles from the standard Gaussian distribution.

The appropriate set of quantiles can be generated as the quantiles of the standard Gaussian distribution corresponding to a uniform probability distribution of size equal to the size of the data sets MARWS_log_dec_df$y_res. In R, the standard approach is to consider the probability distribution \(\left(p_{k}\right)_{k=1}^{n}\) given by

\[\begin{equation} p_{k}\overset{\text{def}}{=}\frac{k-1/2}{n}, \quad \forall k=1,\dots,n, \end{equation}\] That is

n <- nrow(MARWS_log_dec_df)
probs <- seq(from=(0.5/n), to=(1-(0.5/n)), by=(1/n))
show(probs[1:36])
##  [1] 0.002976190 0.008928571 0.014880952 0.020833333 0.026785714 0.032738095
##  [7] 0.038690476 0.044642857 0.050595238 0.056547619 0.062500000 0.068452381
## [13] 0.074404762 0.080357143 0.086309524 0.092261905 0.098214286 0.104166667
## [19] 0.110119048 0.116071429 0.122023810 0.127976190 0.133928571 0.139880952
## [25] 0.145833333 0.151785714 0.157738095 0.163690476 0.169642857 0.175595238
## [31] 0.181547619 0.187500000 0.193452381 0.199404762 0.205357143 0.211309524
# Equivalently
ppoints <- ppoints(n)
show(ppoints[1:36])
##  [1] 0.002976190 0.008928571 0.014880952 0.020833333 0.026785714 0.032738095
##  [7] 0.038690476 0.044642857 0.050595238 0.056547619 0.062500000 0.068452381
## [13] 0.074404762 0.080357143 0.086309524 0.092261905 0.098214286 0.104166667
## [19] 0.110119048 0.116071429 0.122023810 0.127976190 0.133928571 0.139880952
## [25] 0.145833333 0.151785714 0.157738095 0.163690476 0.169642857 0.175595238
## [31] 0.181547619 0.187500000 0.193452381 0.199404762 0.205357143 0.211309524
all(round(probs, digits=14)==round(ppoints, digits=14))
## [1] TRUE

Then, we generate the empirical quantiles associated to the residual component by the function qemp() of the library EnvStats

y_res_qemp <- qemp(ppoints, MARWS_log_dec_df$y_res)
show(y_res_qemp[1:36])
##  [1] -0.29830799 -0.23589001 -0.22203831 -0.22107616 -0.17845450 -0.17271974
##  [7] -0.16365626 -0.14473390 -0.14086182 -0.13493280 -0.12923955 -0.12787059
## [13] -0.12756330 -0.12643289 -0.12344440 -0.12074100 -0.11942630 -0.11513622
## [19] -0.11439538 -0.11145870 -0.10595708 -0.10533935 -0.10405623 -0.09993604
## [25] -0.09059697 -0.08969499 -0.08918659 -0.08731872 -0.08532626 -0.08491639
## [31] -0.08261560 -0.07906878 -0.07778932 -0.07574959 -0.07206045 -0.07081778

On the other hand, we generate the quantiles of the Gaussian distribution associated to the same set of probabilities by the function qnorm() of the library stats.

Gauss_std_quant <- qnorm(ppoints, mean=0, sd=1)
show(Gauss_std_quant[1:36])
##  [1] -2.7503931 -2.3685671 -2.1732447 -2.0368341 -1.9302859 -1.8419929
##  [7] -1.7660888 -1.6991777 -1.6391094 -1.5844329 -1.5341205 -1.4874168
## [13] -1.4437492 -1.4026733 -1.3638364 -1.3269543 -1.2917938 -1.2581616
## [19] -1.2258953 -1.1948571 -1.1649293 -1.1360100 -1.1080108 -1.0808544
## [25] -1.0544725 -1.0288048 -1.0037976 -0.9794029 -0.9555775 -0.9322826
## [31] -0.9094830 -0.8871466 -0.8652442 -0.8437493 -0.8226373 -0.8018858

Thereby, we build a data frame, QQ_plot_df, to draw the Q-Q plot.

QQ_plot_df <- data.frame(t=MARWS_train_df$t, x=Gauss_std_quant, y=y_res_qemp)
head(QQ_plot_df)
##   t         x          y
## 1 1 -2.750393 -0.2983080
## 2 2 -2.368567 -0.2358900
## 3 3 -2.173245 -0.2220383
## 4 4 -2.036834 -0.2210762
## 5 5 -1.930286 -0.1784545
## 6 6 -1.841993 -0.1727197

Second we draw the Q-Q plots.

The Q-Q plot of the residuals.

# library(qqplotr)
Data_df <- QQ_plot_df
length <- nrow(Data_df)
mu <- 0.00
sigma <- round(sd(MARWS_log_dec_df$y_res), digits=3)
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Q-Q plot of the Residual Component of the AU Red Wine Monthly Logarithm Against the Standard Gaussian Distribution"))
subtitle_content <- bquote(paste("Data set size=", .(length), " points;    Theoretical Gaussian Distribution Parameters ", mu, " = ", .(mu), ", ", sigma, " = ", .(sigma)))
caption_content <- "Author: Roberto Monte"
distr <- "norm"
distr_pars <- list(mean=0, sd=1)
x_name <- bquote("Theoretical Quantiles")
y_name <- bquote("Sample Quantiles")
x_breaks_min <- floor(Data_df$x[1])
x_breaks_max <- ceiling(Data_df$x[length])
x_breaks <- seq(from=x_breaks_min, to=x_breaks_max, by=0.5)
x_labs <- format(x_breaks, scientific=FALSE)
x_lims <- c(x_breaks_min,x_breaks_max)
y_breaks_num <- length(x_breaks)
y_binwidth <- round((max(Data_df$y)-min(Data_df$y))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$y)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$y)/y_binwidth))*y_binwidth
y_breaks <- c(round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3))
y_labs <- format(y_breaks, scientific=FALSE)
y_lims <- c(y_breaks_low, y_breaks_up)
shape_1 <- bquote("Q-Q plot")
fill_1 <- bquote("95% confidence bands")
fill_2 <- bquote("99% confidence bands")
col_1 <- bquote("interquartile line")
col_2 <- bquote("regression line")
col_3 <- bquote("y=x line")
leg_shape_labs <- shape_1
leg_fill_labs <- c(fill_1, fill_2)
leg_col_labs <- c(col_1, col_2, col_3)
leg_shape_cols <- c("shape_1" = 19)
leg_fill_cols <- c("fill_1"="gold", "fill_2"="green")
leg_col_cols <- c("col_1"="darkmagenta", "col_2"="red", "col_3"="black")
leg_shape_sort <- "shape_1"
leg_fill_sort <- c("fill_1", "fill_2")
leg_col_sort <- c("col_1", "col_2", "col_3")
Res_QQ_plot <- ggplot(Data_df, aes(sample=y)) +
  stat_qq_band(aes(fill="fill_2"), distribution=distr, dparams=distr_pars, conf = 0.99) +
  stat_qq_band(aes(fill="fill_1"), distribution=distr, dparams=distr_pars, conf = 0.95) +
  stat_qq_line(aes(colour="col_1"), distribution=distr, dparams=distr_pars) +
  stat_smooth(alpha=1, size=0.8, linetype="solid", aes(x=x, y=y, colour="col_2"),
              method="lm" , formula=y~x, se=FALSE, fullrange=FALSE) + 
  geom_abline(aes(slope=1, intercept=0, colour="col_3"),
              size=0.8, linetype="solid", show.legend=FALSE) +
 # geom_segment(aes(x=x[75], xend=-x[75], y=x[75], yend=-x[75], colour="col_3"), 
 #               size=0.8, linetype="solid", show.legend=FALSE) +
  stat_qq_point(aes(shape="shape_1"), colour="blue", alpha=1, size=1.0, 
                distribution=distr, dparams=distr_pars) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis=sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_shape_manual(name="Legend", labels=leg_shape_labs, values=leg_shape_cols, breaks=leg_shape_sort) +
  scale_fill_manual(name="", labels=leg_fill_labs, values=leg_fill_cols, breaks=leg_fill_sort) +
  scale_colour_manual(name="", labels=leg_col_labs, values=leg_col_cols, breaks=leg_col_sort) +
  guides(shape=guide_legend(order=1), fill=guide_legend(order=2), colour=guide_legend(order=3)) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x=element_text(angle=0, vjust=1),
        legend.key.width=unit(0.8,"cm"), legend.position="bottom")
plot(Res_QQ_plot)
## Warning: The following aesthetics were dropped during statistical transformation: sample
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## Warning: Removed 1 rows containing missing values (`geom_smooth()`).

The Q-Q plot of the residual component of the MARWS_log time series, confirms a significant visual evidence for Gaussianity of the generating distribution.

Another interesting plot which might yield visual evidence for Gaussianity is the P-P plot. Recall that while the Q-Q plot emphasizes the lack of fit between the empirical distribution function and the tail of the theoretical distribution, the P-P plot emphasizes the lack of fit between the empirical distribution function and the center of the theoretical distribution. Recall also that distributional parameters have a large impact on P-P plots. For this reason we standardize the residuals before drawing the P-P plot.

Also in this case, we build a data frame, PP_plot_df, to draw the P-P plot.

y_sd_res <- (MARWS_log_dec_df$y_res - mean(MARWS_log_dec_df$y_res))/sd(MARWS_log_dec_df$y_res)
# show(ppoints)
y_sd_res_qemp <- qemp(ppoints, y_sd_res)       # Empirical quantiles associated to the residuals.
y_sd_res_pemp <- pemp(y_sd_res_qemp, y_sd_res) # Empirical probability associated to the residuals.
show(y_sd_res_pemp[1:36])
##  [1] 0.003714710 0.008928571 0.014880952 0.020833333 0.026785714 0.032738095
##  [7] 0.038690476 0.044642857 0.050595238 0.056547619 0.062500000 0.068452381
## [13] 0.074404762 0.080357143 0.086309524 0.092261905 0.098214286 0.104166667
## [19] 0.110119048 0.116071429 0.122023810 0.127976190 0.133928571 0.139880952
## [25] 0.145833333 0.151785714 0.157738095 0.163690476 0.169642857 0.175595238
## [31] 0.181547619 0.187500000 0.193452381 0.199404762 0.205357143 0.211309524
mu <- 0.00
sigma <- 1.00
Gauss_sd_quant <- qnorm(ppoints, mean=mu, sd=sigma)
show(Gauss_sd_quant[1:36])
##  [1] -2.7503931 -2.3685671 -2.1732447 -2.0368341 -1.9302859 -1.8419929
##  [7] -1.7660888 -1.6991777 -1.6391094 -1.5844329 -1.5341205 -1.4874168
## [13] -1.4437492 -1.4026733 -1.3638364 -1.3269543 -1.2917938 -1.2581616
## [19] -1.2258953 -1.1948571 -1.1649293 -1.1360100 -1.1080108 -1.0808544
## [25] -1.0544725 -1.0288048 -1.0037976 -0.9794029 -0.9555775 -0.9322826
## [31] -0.9094830 -0.8871466 -0.8652442 -0.8437493 -0.8226373 -0.8018858
Gauss_sd_prob <- pnorm(Gauss_sd_quant, mean=mu, sd=sigma)
show(Gauss_sd_prob[1:36])
##  [1] 0.002976190 0.008928571 0.014880952 0.020833333 0.026785714 0.032738095
##  [7] 0.038690476 0.044642857 0.050595238 0.056547619 0.062500000 0.068452381
## [13] 0.074404762 0.080357143 0.086309524 0.092261905 0.098214286 0.104166667
## [19] 0.110119048 0.116071429 0.122023810 0.127976190 0.133928571 0.139880952
## [25] 0.145833333 0.151785714 0.157738095 0.163690476 0.169642857 0.175595238
## [31] 0.181547619 0.187500000 0.193452381 0.199404762 0.205357143 0.211309524
PP_plot_df <- data.frame(t=MARWS_train_df$t, x=Gauss_sd_prob, y=y_sd_res, z=y_sd_res_pemp)
head(PP_plot_df)
##   t           x          y           z
## 1 1 0.002976190 -0.7563005 0.003714710
## 2 2 0.008928571  0.1738111 0.008928571
## 3 3 0.014880952 -1.2755179 0.014880952
## 4 4 0.020833333  0.1217104 0.020833333
## 5 5 0.026785714  1.1618046 0.026785714
## 6 6 0.032738095  0.3408027 0.032738095

The P-P plot of the residuals.

# library(qqplotr)
Data_df <- PP_plot_df
length <- nrow(Data_df)
mu <- 0.00
sigma <- 1.00
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "P-P plot of the Standardized Residual Component of the AU Red Wine Monthly Logarithm Against the Standard Gaussian Distribution"))
subtitle_content <- bquote(paste("Data set size=", .(length), " points;    Theoretical Gaussian Distribution Parameters ", mu, " = ", .(mu), ", ", sigma, " = ", .(sigma)))
caption_content <- "Author: Roberto Monte"
distr <- "norm"
distr_pars <- list(mean=mu, sd=sigma)
x_name <- bquote("Theoretical Quantiles")
y_name <- bquote("Sample Probabilities")
x_breaks_min <- floor(Data_df$x[1])
x_breaks_max <- ceiling(Data_df$x[length])
x_breaks <- seq(from=x_breaks_min, to=x_breaks_max, by=0.5)
x_labs <- format(x_breaks, scientific=FALSE)
x_lims <- c(x_breaks_min,x_breaks_max)
y_breaks_num <- length(x_breaks)
y_binwidth <- round((max(Data_df$z)-min(Data_df$z))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$z)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$z)/y_binwidth))*y_binwidth
y_breaks <- c(round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3))
y_labs <- format(y_breaks, scientific=FALSE)
y_lims <- c(y_breaks_low, y_breaks_up)
shape_1 <- bquote("P-P plot")
fill_1 <- bquote("95% confidence bands")
fill_2 <- bquote("99% confidence bands")
col_1 <- bquote("y=x line")
col_2 <- bquote("regression line")
leg_shape_labs <- shape_1
leg_fill_labs <- c(fill_1, fill_2)
leg_col_labs <- c(col_1, col_2)
leg_shape_cols <- c("shape_1" = 19)
leg_fill_cols <- c("fill_1"="gold", "fill_2"="green")
leg_col_cols <- c("col_1"="black", "col_2"="red")
leg_shape_sort <- "shape_1"
leg_fill_sort <- c("fill_1", "fill_2")
leg_col_sort <- c("col_1", "col_2")
Res_PP_plot <- ggplot(Data_df, aes(sample=y)) +
  stat_pp_band(aes(fill="fill_2"), distribution=distr, dparams=distr_pars, conf = 0.99) +
  stat_pp_band(aes(fill="fill_1"), distribution=distr, dparams=distr_pars, conf = 0.95) +
  stat_pp_line(ab=c(0.00,1.00), aes(colour="col_1")) +
  stat_smooth(alpha=1, size=0.8, linetype="solid", aes(x=x, y=z, colour="col_2"),
              method="lm" , formula=y~x, se=FALSE, fullrange=FALSE) + 
  stat_pp_point(aes(shape="shape_1"), colour="blue", alpha=1, size=1.0, 
               distribution=distr, dparams=distr_pars) +
#  geom_point(aes(x=x, y=y, shape="shape_1"), colour="blue", alpha=1, size=1.0) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis=sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_shape_manual(name="Legend", labels=leg_shape_labs, values=leg_shape_cols, breaks=leg_shape_sort) +
  scale_fill_manual(name="", labels=leg_fill_labs, values=leg_fill_cols, breaks=leg_fill_sort) +
  scale_colour_manual(name="", labels=leg_col_labs, values=leg_col_cols, breaks=leg_col_sort) +
  guides(shape=guide_legend(order=1), fill=guide_legend(order=2), colour=guide_legend(order=3)) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x=element_text(angle=0, vjust=1),
        legend.key.width=unit(0.8,"cm"), legend.position="bottom")
plot(Res_PP_plot)
## Warning: The following aesthetics were dropped during statistical transformation: sample
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

Also the P-P plot yields visual evidence for Gaussianity in the residual component of the MARWS_log time series.

On the computational side, we apply four tests to get a possible rejection of the null hypothesis of Gaussianity: the Shapiro-Wilks (SW), D’agostino-Pearson (DP), Anderson-Darling (AD), and Jarque-Bera (JB) test. It should be noted that the above normality tests (and others) rely on the assumption that the data sets have been generated by means of independent random sampling from the same distribution. Therefore, to apply them we implicitly need the support of the previous results.

The SW test.

# Shapiro-Wilks (*SW*) test.
# library(stats)
z <- MARWS_log_dec_df$y_res
MARWS_log_res_SW <- shapiro.test(z)
show(MARWS_log_res_SW)
## 
##  Shapiro-Wilk normality test
## 
## data:  z
## W = 0.99604, p-value = 0.9385

By applying the SW test we cannot reject the null hypothesis of Gaussianity for the data set MARWS_log_dec_df$y_res.

The DP test.

# D'Agostino-Pearson (*DP*) test.
# library(fBasics)
z <- MARWS_log_dec_df$y_res
MARWS_log_res_DP <- dagoTest(z)
show(MARWS_log_res_DP)
## 
## Title:
##  D'Agostino Normality Test
## 
## Test Results:
##   STATISTIC:
##     Chi2 | Omnibus: 0.6528
##     Z3  | Skewness: -0.3737
##     Z4  | Kurtosis: 0.7164
##   P VALUE:
##     Omnibus  Test: 0.7215 
##     Skewness Test: 0.7086 
##     Kurtosis Test: 0.4738

By applying the DP test we cannot reject the null hypothesis of Gaussianity for the data set MARWS_log_dec_df$y_res.

The AD test.

# Anderson-Darling (*AD*) test.
# library(nortest)
z <- MARWS_log_dec_df$y_res
MARWS_log_res_AD <- ad.test(z)
show(MARWS_log_res_AD)
## 
##  Anderson-Darling normality test
## 
## data:  z
## A = 0.178, p-value = 0.9183

By applying the AD test we cannot reject the null hypothesis of Gaussianity for the data set MARWS_log_dec_df$y_res.

The JB test

# Jarque-Bera (*JB*) test.
# library(tseries)
z <- MARWS_log_dec_df$y_res
MARWS_log_res_JB <- jarque.bera.test(z)
show(MARWS_log_res_JB)
## 
##  Jarque Bera Test
## 
## data:  z
## X-squared = 0.34841, df = 2, p-value = 0.8401

By applying the JB test we cannot reject the null hypothesis of Gaussianity for the data set MARWS_log_dec_df$y_res.

We have collected a higly significant evidence that the data set MARWS_log_dec_df$y_res has been generated by independent random sampling from a Gaussian distributions.

As a result of the visual evidence and the normality tests we cannot reject the normality hypotesis for the residual component of the MARWS_log time series.

Summarizing, we cannot reject the null hypothesis that the residual component of the MARWS_log time series is the sample path (realization) of a process which is stationary in mean, stationary in variance (homoskedastic), with independent and Gaussian distributed random variables, that is a Gaussian white noise.

We are going to give a sharper definition and characterization of a Gaussian white noise. Before this, observe that, as after the logarithm transformation, demeaning and deseasonaling we ended up with a residual component which appears to be a path of a Gaussian white noise. Therefore, we can consider the analysis of the MARWS time series as completed and we are in a position to build the prediction bands for the future evolution.

A draft way to do this is to use the command forecast of Hyndman’s homonym library.

# Naive forecast of MARWS_log_stl
MARWS_log_stl <- stl(MARWS_log_ts, s.window="periodic")
class(MARWS_log_stl)
## [1] "stl"
# Showing the first 24 entries of the stl decomposition 
head(MARWS_log_stl$time.series, 24)
##             seasonal    trend    remainder
## Jan 1980 -0.59427976 6.777918 -0.043753227
## Feb 1980 -0.31235460 6.789941  0.037126315
## Mar 1980 -0.14204491 6.801964 -0.104562597
## Apr 1980 -0.04531184 6.813602  0.019554474
## May 1980  0.10239981 6.825240  0.110265860
## Jun 1980  0.11866702 6.837151  0.026116429
## Jul 1980  0.32495385 6.849062  0.009854683
## Aug 1980  0.31783330 6.861554 -0.040520039
## Sep 1980  0.07253890 6.874045  0.074499765
## Oct 1980 -0.02347288 6.886389  0.007137168
## Nov 1980  0.06129343 6.898733 -0.056279117
## Dec 1980  0.11977770 6.911771 -0.164615297
## Jan 1981 -0.59427976 6.924809 -0.057652056
## Feb 1981 -0.31235460 6.936586  0.159093938
## Mar 1981 -0.14204491 6.948363 -0.010612204
## Apr 1981 -0.04531184 6.953573  0.043510582
## May 1981  0.10239981 6.958784  0.028059393
## Jun 1981  0.11866702 6.956441  0.084961033
## Jul 1981  0.32495385 6.954098  0.076588877
## Aug 1981  0.31783330 6.942531  0.102914959
## Sep 1981  0.07253890 6.930964 -0.022497443
## Oct 1981 -0.02347288 6.916448 -0.070777448
## Nov 1981  0.06129343 6.901931 -0.047501130
## Dec 1981  0.11977770 6.889033 -0.039960531
# Transforming the stl object in a df object and showing the first 24 rows.
MARWS_train_log_df <- as.data.frame(as.xts(MARWS_log_stl$time.series))
# Plotting the stl decomposition
plot(MARWS_log_stl)

# Naive forecast of MARWS_log_stl
length <- nrow(MARWS_df)
TrnS_length <- floor(length*0.9)
forecast_lenght <- length-TrnS_length
MARWS_log_ts_naive_for <- forecast(MARWS_log_stl, h=forecast_lenght, method="naive", level=c(90,95))
class(MARWS_log_ts_naive_for)
## [1] "forecast"
head(MARWS_log_ts_naive_for)
## $method
## [1] "STL +  Random walk"
## 
## $model
## Call: rwf(y = x, h = h, drift = FALSE, level = level) 
## 
## Residual sd: 0.1337 
## 
## $lambda
## NULL
## 
## $x
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1980 6.139885 6.514713 6.555357 6.787845 7.037906 6.981935 7.183871 7.138867
## 1981 6.272877 6.783325 6.795706 6.951772 7.089243 7.160069 7.355641 7.363280
## 1982 6.298949 6.453625 6.689599 6.887553 6.925595 6.969791 7.247081 7.159292
## 1983 6.421622 6.582025 6.723832 6.884487 7.146772 7.270313 7.326466 7.443078
## 1984 6.549651 6.721426 6.903747 7.024649 7.284821 7.146772 7.469084 7.722235
## 1985 6.695799 6.904751 7.059618 7.094235 7.338238 7.321850 7.228388 7.641564
## 1986 6.658011 6.912743 7.084226 7.327781 7.338888 7.343426 7.657283 7.751905
## 1987 6.701960 7.047517 7.110696 7.433075 7.472501 7.469654 7.649693 7.631432
## 1988 6.873164 7.345365 7.338238 7.385231 7.639161 7.667158 7.974877 7.718241
## 1989 7.037028 7.265430 7.500529 7.474772 7.696213 7.633854 7.825245 7.669028
## 1990 6.877296 7.089243 7.448916 7.428333 7.613325 7.626083 7.799343 7.763446
## 1991 6.914731 7.417580 7.403670 7.325149 7.512618 7.699389 7.945201 7.780303
## 1992 7.100027 7.196687 7.606387 7.528332 7.577634 7.674153 7.949797 7.707063
## 1993 6.792344 7.128496 7.609367 7.721792 7.720905 7.720905 8.025189 8.110728
##           Sep      Oct      Nov      Dec
## 1980 7.021084 6.870053 6.903747 6.866933
## 1981 6.981006 6.822197 6.915723 6.968850
## 1982 7.006695 6.906755 6.903747 6.922644
## 1983 7.048386 6.839476 7.055313 7.097549
## 1984 7.096721 7.123673 7.142827 7.510978
## 1985 7.213032 7.336937 7.330405 7.226936
## 1986 7.375256 7.212294 7.347944 7.385851
## 1987 7.606885 7.548029 7.582738 7.689829
## 1988 7.540622 7.461066 7.510978 7.532624
## 1989 7.651120 7.586804 7.687539 7.759614
## 1990 7.709757 7.524021 7.671827 7.734559
## 1991 7.743270 7.487174 7.624131 7.682943
## 1992 7.687997 7.596894 7.778630 7.909857
## 1993 7.547502 7.647786 7.772332 7.837949
## 
## $fitted
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1980       NA 6.421810 6.685022 6.652090 6.935557 7.054173 7.188222 7.176750
## 1981 6.152876 6.554802 6.953635 6.892439 7.099484 7.105510 7.366356 7.348521
## 1982 6.254793 6.580874 6.623935 6.786332 7.035264 6.941862 7.176078 7.239960
## 1983 6.208586 6.703547 6.752335 6.820566 7.032198 7.163039 7.476600 7.319345
## 1984 6.383491 6.831576 6.891735 7.000480 7.172361 7.301088 7.353059 7.461963
## 1985 6.796920 6.977724 7.075060 7.156351 7.241946 7.354505 7.528137 7.221268
## 1986 6.512879 6.939936 7.083053 7.180959 7.475492 7.355155 7.549713 7.650162
## 1987 6.671794 6.983886 7.217827 7.207429 7.580787 7.488768 7.675941 7.642572
## 1988 6.975771 7.155089 7.515675 7.434971 7.532943 7.655428 7.873445 7.967756
## 1989 6.818566 7.318953 7.435739 7.597263 7.622484 7.712480 7.840140 7.818125
## 1990 7.045557 7.159221 7.259553 7.545649 7.576045 7.629592 7.832370 7.792223
## 1991 7.020501 7.196656 7.587890 7.500403 7.472861 7.528885 7.905676 7.938081
## 1992 6.968886 7.381952 7.366996 7.703120 7.676043 7.593901 7.880440 7.942677
## 1993 7.195799 7.074270 7.298806 7.706100 7.869503 7.737172 7.927192 8.018069
##           Sep      Oct      Nov      Dec
## 1980 6.893573 6.925072 6.954820 6.962232
## 1981 7.117985 6.884994 6.906964 6.974208
## 1982 6.913998 6.910683 6.991521 6.962232
## 1983 7.197784 6.952375 6.924243 7.113797
## 1984 7.476940 7.000710 7.208439 7.201312
## 1985 7.396270 7.117020 7.421703 7.388889
## 1986 7.506611 7.279244 7.297061 7.406428
## 1987 7.386137 7.510873 7.632795 7.641223
## 1988 7.472947 7.444610 7.545832 7.569462
## 1989 7.423734 7.555108 7.671570 7.746023
## 1990 7.518152 7.613745 7.608788 7.730311
## 1991 7.535009 7.647258 7.571940 7.682615
## 1992 7.461768 7.591985 7.681661 7.837114
## 1993 7.865433 7.451490 7.732552 7.830816
## 
## $residuals
##                Jan           Feb           Mar           Apr           May
## 1980            NA  0.0929029730 -0.1296654812  0.1357550146  0.1023493295
## 1981  0.1200011798  0.2285230284 -0.1579291075  0.0593333133 -0.0102406610
## 1982  0.0441563263 -0.1272494136  0.0656645882  0.1012202265 -0.1096690262
## 1983  0.2130358341 -0.1215222945 -0.0285023802  0.0639211353  0.1145738757
## 1984  0.1661593494 -0.1101502071  0.0120118747  0.0241686969  0.1124602305
## 1985 -0.1011213772 -0.0729733127 -0.0154428238 -0.0621158583  0.0962916525
## 1986  0.1451324851 -0.0271933910  0.0011739195  0.1468210404 -0.1366040562
## 1987  0.0301667456  0.0636316897 -0.1071307805  0.2256461500 -0.1082862558
## 1988 -0.1026073768  0.1902758406 -0.1774363725 -0.0497403029  0.1062185969
## 1989  0.2184614536 -0.0535230571  0.0647900800 -0.1224903789  0.0737288053
## 1990 -0.1682606215 -0.0699780821  0.1893632654 -0.1173159843  0.0372801337
## 1991 -0.1057704939  0.2209243441 -0.1842197945 -0.1752544080  0.0397569351
## 1992  0.1311414545 -0.1852657614  0.2393911368 -0.1747886990 -0.0984095858
## 1993 -0.4034547821  0.0542263526  0.3105609102  0.0156921629 -0.1485981765
##                Jun           Jul           Aug           Sep           Oct
## 1980 -0.0722384875 -0.0043508015 -0.0378831616  0.1275113643 -0.0550187737
## 1981  0.0545588513 -0.0107149440  0.0147590375 -0.1369794463 -0.0627965713
## 1982  0.0279282715  0.0710030753 -0.0806681262  0.0926977220 -0.0039286694
## 1983  0.1072735054 -0.1501341116  0.1237333141 -0.1493975657 -0.1128981917
## 1984 -0.1543159344  0.1160248661  0.2602714133 -0.3802189663  0.1229631855
## 1985 -0.0326556375 -0.2997481016  0.4202965432 -0.1832383815  0.2199170327
## 1986 -0.0117291059  0.1075697244  0.1017430937 -0.1313551551 -0.0669495307
## 1987 -0.0191137731 -0.0262483886 -0.0111404056  0.2207472666  0.0371562175
## 1988  0.0117298824  0.1014318059 -0.2495153950  0.0676749767  0.0164557645
## 1989 -0.0786262809 -0.0148951076 -0.1490964493  0.2273862871  0.0316951383
## 1990 -0.0035094227 -0.0330261992 -0.0287764559  0.1916048757 -0.0897236705
## 1991  0.1705046603  0.0395248868 -0.1577774910  0.2082610129 -0.1600842278
## 1992  0.0802518874  0.0693574555 -0.2356140072  0.2262289110  0.0049090505
## 1993 -0.0162672013  0.0979972306  0.0926588146 -0.3179315002  0.1962961414
##                Nov           Dec
## 1980 -0.0510724604 -0.0952982409
## 1981  0.0087597518 -0.0053573381
## 1982 -0.0877738273 -0.0395876339
## 1983  0.1310700989 -0.0162482606
## 1984 -0.0656116903  0.3096660830
## 1985 -0.0912980081 -0.1619534612
## 1986  0.0508830484 -0.0205770129
## 1987 -0.0500567873  0.0486059120
## 1988 -0.0348540686 -0.0368384011
## 1989  0.0159689248  0.0135911167
## 1990  0.0630390764  0.0042477786
## 1991  0.0521905852  0.0003283164
## 1992  0.0969694029  0.0727422521
## 1993  0.0397792235  0.0071330730
MARWS_log_ts
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1980 6.139885 6.514713 6.555357 6.787845 7.037906 6.981935 7.183871 7.138867
## 1981 6.272877 6.783325 6.795706 6.951772 7.089243 7.160069 7.355641 7.363280
## 1982 6.298949 6.453625 6.689599 6.887553 6.925595 6.969791 7.247081 7.159292
## 1983 6.421622 6.582025 6.723832 6.884487 7.146772 7.270313 7.326466 7.443078
## 1984 6.549651 6.721426 6.903747 7.024649 7.284821 7.146772 7.469084 7.722235
## 1985 6.695799 6.904751 7.059618 7.094235 7.338238 7.321850 7.228388 7.641564
## 1986 6.658011 6.912743 7.084226 7.327781 7.338888 7.343426 7.657283 7.751905
## 1987 6.701960 7.047517 7.110696 7.433075 7.472501 7.469654 7.649693 7.631432
## 1988 6.873164 7.345365 7.338238 7.385231 7.639161 7.667158 7.974877 7.718241
## 1989 7.037028 7.265430 7.500529 7.474772 7.696213 7.633854 7.825245 7.669028
## 1990 6.877296 7.089243 7.448916 7.428333 7.613325 7.626083 7.799343 7.763446
## 1991 6.914731 7.417580 7.403670 7.325149 7.512618 7.699389 7.945201 7.780303
## 1992 7.100027 7.196687 7.606387 7.528332 7.577634 7.674153 7.949797 7.707063
## 1993 6.792344 7.128496 7.609367 7.721792 7.720905 7.720905 8.025189 8.110728
##           Sep      Oct      Nov      Dec
## 1980 7.021084 6.870053 6.903747 6.866933
## 1981 6.981006 6.822197 6.915723 6.968850
## 1982 7.006695 6.906755 6.903747 6.922644
## 1983 7.048386 6.839476 7.055313 7.097549
## 1984 7.096721 7.123673 7.142827 7.510978
## 1985 7.213032 7.336937 7.330405 7.226936
## 1986 7.375256 7.212294 7.347944 7.385851
## 1987 7.606885 7.548029 7.582738 7.689829
## 1988 7.540622 7.461066 7.510978 7.532624
## 1989 7.651120 7.586804 7.687539 7.759614
## 1990 7.709757 7.524021 7.671827 7.734559
## 1991 7.743270 7.487174 7.624131 7.682943
## 1992 7.687997 7.596894 7.778630 7.909857
## 1993 7.547502 7.647786 7.772332 7.837949
# length(MARWS_log_ts)
MARWS_log_full_ts <- ts(log(MARWS_df$RWS), start=c(1980,1), frequency = 12)
# length(MARWS_log_full_ts)
MARWS_log_train_ts <- ts(log(MARWS_df$RWS), start=c(1980,1), end=c(1993,12), frequency = 12)
# length(MARWS_log_train_ts)
MARWS_log_val_ts <- ts(log(MARWS_df$RWS[TrnS_length+1:forecast_lenght]), start=c(1994,1), frequency = 12)
# length(MARWS_log_val_ts)
# Plotting the full RWS data set
cols <- c("black", "brown", "blue")
forecast::autoplot(MARWS_log_ts_naive_for, col="black", lwd = 0.8,  fcol="brown", flwd = 1.0) + 
  forecast::autolayer(MARWS_log_train_ts, series = "MARWS in-sample set") +
  forecast::autolayer(MARWS_log_val_ts, series = "MARWS out-of-sample set") +
  forecast::autolayer(MARWS_log_ts_naive_for[["mean"]], series = "MARWS out-of-sample forecast") +
    xlab("Years") +
  ylab("MARWS (logarithm)") +
  guides(colour=guide_legend(title="Data series"), 
       fill=guide_legend(title="Prediction interval")) +
  scale_color_manual(values=cols)

or

# ARIMA forecast of MARWS_log_stl
MARWS_log_stl <- stl(MARWS_log_ts, s.window="periodic")
# class(MARWS_log_stl)
# Showing the first 24 entries of the stl decomposition 
# head(MARWS_log_stl$time.series, 24)
# Transforming the stl object in a df object and showing the first 24 rows.
MARWS_train_log_df <- as.data.frame(as.xts(MARWS_log_stl$time.series))
# Plotting the stl decomposition
# plot(MARWS_log_stl)
# ARIMA forecast of MARWS_log_stl
length <- nrow(MARWS_df)
TrnS_length <- floor(length*0.9)
forecast_lenght <- length-TrnS_length
MARWS_log_ts_arima_for <- forecast(MARWS_log_stl, h=forecast_lenght, method="arima", level=c(90,95))
class(MARWS_log_ts_arima_for)
## [1] "forecast"
head(MARWS_log_ts_arima_for)
## $method
## [1] "STL +  ARIMA(0,1,1) with drift"
## 
## $model
## Series: x 
## ARIMA(0,1,1) with drift 
## 
## Coefficients:
##           ma1   drift
##       -0.8342  0.0053
## s.e.   0.0439  0.0014
## 
## sigma^2 = 0.01102:  log likelihood = 139.91
## AIC=-273.81   AICc=-273.66   BIC=-264.46
## 
## $level
## [1] 90 95
## 
## $mean
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1994 7.097142 7.384351 7.559944 7.661960 7.814955 7.836506 8.048076 8.046239
## 1995 7.160543 7.447751 7.623344 7.725361 7.878356 7.899906 8.111476         
##           Sep      Oct      Nov      Dec
## 1994 7.806228 7.715499 7.805549 7.869317
## 1995                                    
## 
## $lower
##               90%      95%
## Jan 1994 6.924507 6.891435
## Feb 1994 7.209359 7.175836
## Mar 1994 7.382628 7.348658
## Apr 1994 7.482349 7.447941
## May 1994 7.633078 7.598236
## Jun 1994 7.652391 7.617120
## Jul 1994 7.861750 7.826055
## Aug 1994 7.857728 7.821614
## Sep 1994 7.615557 7.579029
## Oct 1994 7.522692 7.485756
## Nov 1994 7.610630 7.573288
## Dec 1994 7.672307 7.634566
## Jan 1995 6.961465 6.923328
## Feb 1995 7.246627 7.208097
## Mar 1995 7.420194 7.381276
## Apr 1995 7.520205 7.480902
## May 1995 7.671213 7.631530
## Jun 1995 7.690796 7.650736
## Jul 1995 7.900417 7.859983
## 
## $upper
##               90%      95%
## Jan 1994 7.269778 7.302850
## Feb 1994 7.559342 7.592866
## Mar 1994 7.737260 7.771229
## Apr 1994 7.841571 7.875980
## May 1994 7.996832 8.031675
## Jun 1994 8.020621 8.055892
## Jul 1994 8.234402 8.270097
## Aug 1994 8.234750 8.270864
## Sep 1994 7.996899 8.033426
## Oct 1994 7.908306 7.945243
## Nov 1994 8.000469 8.037810
## Dec 1994 8.066326 8.104068
## Jan 1995 7.359620 7.397758
## Feb 1995 7.648875 7.687405
## Mar 1995 7.826494 7.865412
## Apr 1995 7.930516 7.969819
## May 1995 8.085498 8.125181
## Jun 1995 8.109016 8.149076
## Jul 1995 8.322536 8.362969
# MARWS_log_ts
# length(MARWS_log_ts)
MARWS_log_full_ts <- ts(log(MARWS_df$RWS), start=c(1980,1), frequency = 12)
# length(MARWS_log_full_ts)
MARWS_log_train_ts <- ts(log(MARWS_df$RWS), start=c(1980,1), end=c(1993,12), frequency = 12)
# length(MARWS_log_train_ts)
MARWS_log_val_ts <- ts(log(MARWS_df$RWS[TrnS_length+1:forecast_lenght]), start=c(1994,1), frequency = 12)
# length(MARWS_log_val_ts)
# Plotting the full RWS data set
cols <- c("black", "brown", "blue")
forecast::autoplot(MARWS_log_ts_arima_for, col="black", lwd = 0.8,  fcol="brown", flwd = 1.0) + 
  forecast::autolayer(MARWS_log_train_ts, series = "MARWS in-sample set") +
  forecast::autolayer(MARWS_log_val_ts, series = "MARWS out-of-sample set") +
  forecast::autolayer(MARWS_log_ts_arima_for[["mean"]], series = "MARWS out-of-sample forecast") +
    xlab("Years") +
  ylab("MARWS (logarithm)") +
  guides(colour=guide_legend(title="Data series"), 
       fill=guide_legend(title="Prediction interval")) +
  scale_color_manual(values=cols)

We will show a more detailed way to obtain the above forecasts. To this we need a more detailed knowledge of the Gaussian white noise model.

Let \(\left(W_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{W}\) be a stochastic process on a probability space \(\left(\Omega,\mathcal{E},\mathbf{P}\right)\equiv\Omega\) with states in \(\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\). We assume that \(\mathbf{W}\) is a process of order \(2\), which means that all random variables in \(\mathbf{W}\) have finite moment of order \(2\) (see Definition …)

Definition 3.16 (Strong White Noise) We say that \(W\) is an \(N\)-variate real strong white noise (SWN) or independent identically distributed noise (IIDN), if the random variables in \(\mathbf{W}\) are independent and identically distributed with mean \(0\). In case \(N=1\), we usually neglect to mention \(N\).

To denote that \(\mathbf{W}\) is an \(N\)-variate strong white noise we write \(\mathbf{W}\sim {SWN}^{N}\left(\Sigma_{\mathbf{W}}^{2}\right)\) or \(\mathbf{W}\sim {IID}^{N}\left(\Sigma_{\mathbf{W}}^{2}\right)\), where \(\Sigma_{\mathbf{W}}^{2}\) is the common variance-covariance matrix of the random variables in \(\mathbf{W}\). In case \(N=1\), we write \(\Sigma_{\mathbf{W}}^{2}\equiv\sigma_{\mathbf{W}}^{2}\) for the common variance of the random variables in \(\mathbf{W}\) and write \(\mathbf{W}\sim SWN\left(\sigma_{\mathbf{W}}^{2}\right)\) or \(\mathbf{W}\sim IID\left(\sigma_{\mathbf{W}}^{2}\right)\), omitting \(N\).

Definition 3.17 (Gaussian Strong White Noise) We say that \(\mathbf{W}\) is a Gaussian white noise if the random variables in \(\mathbf{W}\) are Gaussian distributed.

To denote that \(\mathbf{W}\) is a Gaussian white noise we write \(\mathbf{W}\sim GWN^{N}\left(\Sigma_{\mathbf{W}}^{2}\right)\), where \(\Sigma_{\mathbf{W}}^{2}\) is the common variance-covariance matrix of the \(N\)-variate random variables in \(\mathbf{W}\). In case \(N=1\), we set \(\Sigma_{\mathbf{W}}^{2}\equiv\sigma_{\mathbf{W}}^{2}\) and omit \(N\).

Let \(\left(W_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{W}\sim {SWN}^{N}\left(\Sigma_{\mathbf{W}}^{2}\right)\).

Theorem 3.1 (Strong Stationarity of Strong White Noises) The process \(\mathbf{W}\) is strong-sense stationary and ergodic in the wide sense* (see Definitions…).

Proposition 3.4 (Functions of Strong White Noises) The mean [resp. variance-covariance] function \(\mu_{\mathbf{W}}:\mathbb{T}\rightarrow\mathbb{R}^{N}\) [resp. \(\Sigma_{\mathbf{W}}^{2}:\mathbb{T}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\)] satisfies \[\begin{equation} \mu_{\mathbf{W}}\left(t\right)=0,\quad\text{[resp. }\Sigma_{\mathbf{W}}^{2}\left(t\right)=\Sigma_{\mathbf{W}}^{2}\text{]}, \end{equation}\] for every \(t\in\mathbb{T}\). The autocovariance function \(\Gamma_{\mathbf{W}}:\mathbb{T}^{2}\rightarrow\mathbb{R}^{N}\) and the autocorrelation function \(\mathrm{P}_{\mathbf{W}}:\mathbb{T}\times\mathbb{T}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) statisfy \[\begin{equation} \Gamma_{\mathbf{W}}\left(s,t\right)=\mathrm{P}_{\mathbf{W}}\left(s,t\right)=0, \end{equation}\] for all \(s,t\in\mathbb{T}\) such that \(s\neq t\). In particular, when \(N=1\), fhe autocovariance function \(\gamma_{\mathbf{W}}:\mathbb{T}^{2}\rightarrow\mathbb{R}^{N}\) and the autocorrelation function \(\rho_{\mathbf{W}}:\mathbb{T}\times\mathbb{T}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) statisfy \[\begin{equation} \gamma_{\mathbf{W}}\left(s,t\right)=\rho_{\mathbf{W}}\left(s,t\right)=0, \end{equation}\] for all \(s,t\in\mathbb{T}\) such that \(s\neq t\).

Proposition 3.5 (Functions of Strong White Noises) Fixed any \(t_{0}\in\mathbb{T}\), write \(\mathbb{T}_{0}\equiv\left\{\tau\in\mathbb{R}:t_{0}+\tau\in\mathbb{T}\right\}\). The reduced autocovariance function \(\Gamma_{\mathbf{W},t_{0}}:\mathbb{T}_{0}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) and the reduced autocorrelation function \(\mathrm{P}_{\mathbf{W},t_{0}}:\mathbb{T}_{0}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) of \(\mathbf{W}\) referred to \(t_{0}\) satisfy \[\begin{equation} \Gamma_{\mathbf{W},t_{0}}\left(\tau\right)=\left\{ \begin{array} [c]{ll} \Sigma_{\mathbf{W}}, & \text{if }\tau=0,\\ 0, & \text{if }\tau\neq 0, \end{array} \right. \quad\text{and}\quad \mathrm{P}_{\mathbf{W},t_{0}}\left(\tau\right)=\left\{ \begin{array} [c]{ll} I_{N}, & \text{if }\tau=0,\\ 0, & \text{if }\tau\neq 0, \end{array} \right. \tag{3.42} \end{equation}\] where \(I_{N}\) is the identity matrix in \(\mathbb{R}^{N}\times\mathbb{R}^{N}\). In particular, when \(N=1\), The reduced autocovariance function \(\gamma_{\mathbf{W},t_{0}}:\mathbb{T}_{0}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) and the reduced autocorrelation function \(\rho_{\mathbf{W},t_{0}}:\mathbb{T}_{0}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) of \(\mathbf{W}\) referred to \(t_{0}\) satisfy \[\begin{equation} \gamma_{\mathbf{W},t_{0}}\left(\tau\right)=\left\{ \begin{array} [c]{ll} \sigma_{\mathbf{W}}, & \text{if }\tau=0,\\ 0, & \text{if }\tau\neq 0, \end{array} \right. \quad\text{and}\quad \rho_{\mathbf{W},t_{0}}\left(\tau\right)=\left\{ \begin{array} [c]{ll} 1, & \text{if }\tau=0,\\ 0, & \text{if }\tau\neq 0. \end{array} \right. \tag{3.43} \end{equation}\]

Proposition 3.6 (Functions of Strong White Noises) Assume that \(\mathbb{T}\equiv\mathbb{Z}\). The partial autocorrelation function \(\Phi_{\mathbf{W}}:\mathbb{Z}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) satisfies \[\begin{equation} \Phi_{\mathbf{W}}\left(\tau\right)=0, \end{equation}\] for every \(\tau\geq1\).

3.1 Parameter Estimation

Let \(\left(W_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{W}\) an \(N\)-variate strong white noise on a probability space \(\left(\Omega,\mathcal{E},\mathbf{P}\right)\equiv\Omega\). For sake of simplicity, assume that \(\mathbb{T}\equiv\mathbb{N}_{0}\) or \(\mathbb{T}\equiv\mathbb{Z}\). Write \(\mu_{\mathbf{W}}\) [resp. \(\Sigma_{\mathbf{W}}^{2}\)] for the constant value of the mean [resp. variance-covariance] function of \(\mathbf{W}\).

Definition 3.18 (Time average estimator) Fixed any \(T\in\mathbb{N}\), we call the time average estimator of size \(T\) of \(\mathbf{W}\) the statistic \[\begin{equation} \bar{\mathbf{W}}_{T}\overset{\text{def}}{=}\left\{ \begin{array} [c]{ll} \frac{1}{T}\sum\limits_{t=1}^{T}W_{t}, & \text{if }\mathbb{T}\equiv \mathbb{N}_{0},\\ \frac{1}{2T+1}\sum\limits_{t=-T}^{T}W_{t}, & \text{if }\mathbb{T} \equiv\mathbb{Z}. \end{array} \right. \tag{3.44} \end{equation}\]

Definition 3.19 (Time Variance Covariance estimator) Fixed any \(T\in\mathbb{N}\), we call the time variance-covariance estimator of size \(T\) of \(\mathbf{W}\) the statistic \[\begin{equation} S_{\mathbf{W},T}^{2}\overset{\text{def}}{=}\left\{ \begin{array} [c]{ll} \frac{1}{T}\sum\limits_{t=1}^{T}\left(W_{t}-\bar{W}_{T}\right)\left(W_{t}-\bar{W}_{T}\right)^{\intercal}, & \text{if }\mathbb{T}\equiv\mathbb{N}_{0},\\ \frac{1}{2T+1}\sum\limits_{t=-T}^{T}\left(W_{t}-\bar{W}_{T}\right)\left(W_{t}-\bar{W}_{T}\right)^{\intercal}, & \text{if }\mathbb{T}\equiv\mathbb{Z}. \end{array} \right. \tag{3.45} \end{equation}\] In case \(N=1\), the time variance-covariance estimator of size \(T\) of \(\mathbf{W}\) is more simply referred to as time variance estimator of size \(T\) of \(\mathbf{W}\) and it is given by \[\begin{equation} S_{\mathbf{W},T}^{2}\overset{\text{def}}{=}\left\{ \begin{array} [c]{ll} \frac{1}{T}\sum\limits_{t=1}^{T}\left(W_{t}-\bar{W}_{T}\right)^{2}, & \text{if }\mathbb{T}\equiv\mathbb{N}_{0},\\ \frac{1}{2T+1}\sum\limits_{t=-T}^{T}\left(W_{t}-\bar{W}_{T}\right)^{2}, & \text{if }\mathbb{T}\equiv\mathbb{Z}. \end{array} \right. \tag{3.46} \end{equation}\]

Definition 3.20 (Time Autocovariance estimator) Fixed any \(T\in\mathbb{N}\), we call autocovariance estimator of size \(T\) and lag \(\tau\) of \(\mathbf{W}\) the statistic \[\begin{equation} G_{\mathbf{W},T}\left(\tau\right)\overset{\text{def}}{=}\left\{ \begin{array} [c]{ll} \frac{1}{T}\sum\limits_{t=1}^{T-\tau}\left(W_{t}-\bar{W}_{T}\right) \left(W_{t+\tau}-\bar{W}_{T}\right)^{\intercal}, & \text{if }\mathbb{T}\equiv\mathbb{N}_{0},\\ \frac{1}{2T+1}\sum\limits_{t=-T}^{T-\tau}\left(W_{t}-\bar{W}_{T}\right) \left(W_{t+\tau}-\bar{W}_{T}\right)^{\intercal}, & \text{if }\mathbb{T}\equiv\mathbb{Z}, \end{array} \right. \quad\forall\tau=0,1,\dots,T-1. \tag{3.47} \end{equation}\] In case \(N=1\) the autocovariance estimator of size \(T\) and lag \(\tau\) of \(\mathbf{W}\) is given by \[\begin{equation} G_{\mathbf{W},T}\left(\tau\right)\overset{\text{def}}{=}\left\{ \begin{array} [c]{ll} \frac{1}{T}\sum\limits_{t=1}^{T-\tau}\left(W_{t}-\bar{W}_{T}\right)\left(W_{t+\tau}-\bar{W}_{T}\right), & \text{if }\mathbb{T}\equiv\mathbb{N}_{0},\\ \frac{1}{2T+1}\sum\limits_{t=-T}^{T-\tau}\left(W_{t}-\bar{W}_{T}\right)\left(W_{t+\tau}-\bar{W}_{T}\right), & \text{if }\mathbb{T}\equiv\mathbb{Z}, \end{array} \right. \quad\forall\tau=0,1,\dots,T-1. \tag{3.48} \end{equation}\]

Note that in Equations (3.47) and (3.48) the factor \(1/\left(T-\tau\right)\) [resp. \(1/\left(2T+1-\tau\right)\)] is sometimes used in place of \(1/T\) [resp. \(1/\left(2T+1\right)\)].

Clearly, \[\begin{equation} G_{\mathbf{W},T}\left(0\right)=S_{\mathbf{W},T}^{2}. \end{equation}\]

Definition 3.21 (Time Autocorrelation estimator) Fixed any \(T\in\mathbb{N}\), we call time autocorrelation estimator of size \(T\) and shift (lag) \(\tau\) of \(\mathbf{W}\) the statistic \[\begin{equation} R_{\mathbf{W},T}\left(\tau\right)\overset{\text{def}}{=} \operatorname*{diag}\left(G_{\mathbf{W},T}\left(0\right)\right)^{-\frac{1}{2}} G_{\mathbf{W},T}\left(\tau\right) \operatorname*{diag}\left(G_{\mathbf{W},T}\left(0\right)\right)^{-\frac{1}{2}}, \quad\forall\tau=0,1,\dots,T-1. \tag{3.49} \end{equation}\] where \(\operatorname*{diag}\left(G_{\mathbf{W},T}\left(0\right)\right)\) is the diagonal matrix having for diagonal entries the corresponding diagonal entries of \(G_{\mathbf{W},T}\left(0\right)\) and \(\det:\mathbb{R}^{N}\times\mathbb{R}^{N}\rightarrow\mathbb{R}\) is the determinant function. In particular, when \(N=1\) we have \[\begin{equation} R_{\mathbf{W},T}\left(\tau\right)\overset{\text{def}}{=} \frac{G_{\mathbf{W},T}\left(\tau\right)}{G_{\mathbf{W},T}\left(0\right)} =\frac{G_{\mathbf{W},T}\left(\tau\right)}{S_{\mathbf{W},T}^{2}}, \quad\forall\tau=0,1,\dots,T-1. \tag{3.50} \end{equation}\]

Proposition 3.7 (Time average estimator) For any \(T\in\mathbb{N}\), the time average estimator \(\bar{\mathbf{W}}_{T}\) of size \(T\) of \(\mathbf{W}\) is an unbiased estimator of \(\mu_{\mathbf{W}}=0\) and its mean squared error is given by \[\begin{equation} \operatorname*{trace}\left(Var\left(\bar{\mathbf{W}}_{T}\right)\right)=\left\{ \begin{array} [c]{ll} \frac{1}{T}\left(\sum\limits_{\tau=-\left(T-1\right)}^{T-1} \left(1-\frac{\left\vert\tau\right\vert}{T}\right) \operatorname*{trace}\left(\Gamma_{\mathbf{W},0}\left(\tau\right)\right)\right),& \text{if } \mathbb{T}\equiv\mathbb{N}_{0},\\ \frac{1}{2T+1}\left(\sum\limits_{\tau=-2T}^{2T} \left(1-\frac{\left\vert\tau\right\vert}{2T+1}\right) \operatorname*{trace}\left(\Gamma_{\mathbf{W},0}\left(\tau\right)\right)\right), & \text{if } \mathbb{T}\equiv\mathbb{Z}, \end{array} \right. \tag{3.51} \end{equation}\] where \(\gamma_{\mathbf{W},0}\left(\tau\right)\) is the value of the reduced autocovariance function of \(\mathbf{W}\), referred to \(t_{0}=0\), at shift (lag) \(\tau\) (see Definition…).

Corollary 3.1 (Time average estimator) In case \(N=1\), the mean squared error of the time average estimator \(\bar{\mathbf{W}}_{T}\) is given by \[\begin{equation} \mathbf{D}^{2}\left[\bar{\mathbf{W}}_{T}\right]=\left\{ \begin{array} [c]{ll} \frac{\sigma_{\mathbf{W}}^{2}}{T}\left(1+2\sum\limits_{\tau=1}^{T-1} \left(1-\frac{\tau}{T}\right)\rho_{\mathbf{W},0}\left(\tau\right)\right), & \text{if }\mathbb{T}\equiv\mathbb{N}_{0}\\ \frac{\sigma_{\mathbf{W}}^{2}}{2T+1}\left(1+2\sum\limits_{\tau=1}^{2T} \left(1-\frac{\tau}{2T+1}\right)\rho_{\mathbf{W},0}\left(\tau\right)\right), & \text{if }\mathbb{T}\equiv\mathbb{Z}, \end{array} \right. \tag{3.52} \end{equation}\] where \(\rho_{\mathbf{W},0}\left(\tau\right)\) is the value of the reduced autocorrelation function of \(\mathbf{W}\), referred to \(t_{0}=0\), at shift (lag) \(\tau\) (see Definition…).

Proposition 3.8 (Confidence Intervals for Time Avg. of a GSWN) In case \(N=1\), assume that \(\mathbf{W}\) is Gaussian, that is \(\mathbf{W}\sim GWN\left(\sigma_{\mathbf{W}}^{2}\right)\), for some \(\sigma_{\mathbf{W}}>0\). Then a confidence interval for \(\mu_{\mathbf{W}}=0\) at the confidence level \(100\left(1-\alpha\right)\%\), for any \(\alpha\in\left(0,1\right)\), is given by \[\begin{equation} \left(\bar{W}_{T}-t_{T-1,\alpha/2}\frac{S_{\mathbf{W},T}}{\sqrt{T}},\ \bar{W}_{T}+t_{T-1,\alpha/2}\frac{S_{\mathbf{W},T}}{\sqrt{T}}\right), \tag{3.53} \end{equation}\] where \(t_{T-1,\alpha/2}\) is the upper tail critical value of level \(\alpha/2\) of the Student random variable with \(T-1\) degree of freedom and \(S_{\mathbf{W},T}\equiv\sqrt{S_{\mathbf{W},T}^{2}}\) is the time standard deviation of \(\mathbf{W}\). Hence, a realization of the confidence interval (3.53) is given by \[\begin{equation} \left(\bar{w}_{T}-t_{T-1,\alpha/2}\frac{s_{\mathbf{W},T}}{\sqrt{T}},\ \bar{w}_{T}+t_{T-1,\alpha/2}\frac{s_{\mathbf{W},T}}{\sqrt{T}}\right), \tag{3.54} \end{equation}\] where \(\bar{w}_{T}\) [resp. \(s_{\mathbf{W},T}\)] is the realization of the time average estimator \(\bar{\mathbf{W}}_{T}\) [resp. time standard deviation \(S_{\mathbf{W},T}\)] of \(\mathbf{W}\). In addition, there is evidence against the null hypothesis \(H_{0}:\mu_{\mathbf{W}}=0\) at the significance level of \(100\alpha\%\), for any \(\alpha\in\left(0,1\right)\), when \[\begin{equation} \left\vert \frac{\bar{w}_{T}}{s_{\mathbf{W},T}/\sqrt{T}}\right\vert>t_{T-1,\alpha/2} \Leftrightarrow \mathbf{P}\left(\left\vert T_{T-1}\right\vert\geq\left\vert \frac{\bar{w}_{T}}{s_{\mathbf{W},T}/\sqrt{T}}\right\vert\right)<\alpha, \tag{3.55} \end{equation}\] where \(T_{T-1}\) is the standard Student random variable with \(T-1\) degree of freedom.

Proposition 3.9 (Confidence Intervals for Time Avg. of a SWN) In case \(N=1\), assume that \(T\) is “large”. Then an approximate confidence interval for \(\mu_{\mathbf{W}}=0\), at the confidence level of \(100\left(1-\alpha\right)\%\), for any \(\alpha\in\left(0,1\right)\), is given by \[\begin{equation} \left(\bar{W}_{T}-z_{\alpha/2}\frac{S_{\mathbf{W},T}}{\sqrt{T}},\ \bar{W}_{T}+z_{\alpha/2}\frac{S_{\mathbf{W},T}}{\sqrt{T}}\right), \tag{3.56} \end{equation}\] where \(z_{\alpha/2}\) is the upper tail critical value of level \(\alpha/2\) of the standard Gaussian random variable and \(S_{\mathbf{W},T}\equiv\sqrt{S_{\mathbf{W},T}^{2}}\) is the time standard deviation of \(\mathbf{W}\). Hence, a realization of the confidence interval (3.56) is given by \[\begin{equation} \left(\bar{w}_{T}-z_{\alpha/2}\frac{s_{\mathbf{W},T}}{\sqrt{T}},\ \bar{w}_{T}+z_{\alpha/2}\frac{s_{\mathbf{W},T}}{\sqrt{T}}\right), \tag{3.57} \end{equation}\] where \(\bar{w}_{T}\) [resp. \(s_{\mathbf{W},T}\)] is the realization of the time average estimator \(\bar{\mathbf{W}}_{T}\) [resp. time standard deviation \(S_{\mathbf{W},T}\)] of \(\mathbf{W}\). In addition, there is evidence against the null hypothesis \(H_{0}:\mu_{\mathbf{W}}=0\) at the approximate significance level of \(100\alpha\%\), for any \(\alpha\in\left(0,1\right)\), when \[\begin{equation} \left\vert\frac{\bar{w}_{T}\sqrt{T}}{s_{\mathbf{W},T}}\right\vert>z_{\alpha/2} \Leftrightarrow \mathbf{P}\left(\left\vert Z\right\vert \geq\left\vert\frac{\bar{w}_{W,T}\sqrt{T}}{s_{\mathbf{W},T}}\right\vert\right)<\alpha, \tag{3.58} \end{equation}\] where \(Z\sim N\left(0,1\right)\). In particular, we have the approximate confidence intervals \[ \begin{array} [c]{ll} \left(\bar{w}_{T} -1.645 \frac{s_{\mathbf{W},T}}{\sqrt{T}},\ \bar{w}_{T} +1.645 \frac{s_{\mathbf{W},T}}{\sqrt{T}}\right) & \text{at 90% c.l.}\\ \left(\bar{w}_{T} -1.960 \frac{s_{\mathbf{W},T}}{\sqrt{T}},\ \bar{w}_{T} +1.960 \frac{s_{\mathbf{W},T}}{\sqrt{T}}\right) & \text{at 95% c.l.}\\ \left(\bar{w}_{T} -2.575 \frac{s_{\mathbf{W},T}}{\sqrt{T}},\ \bar{w}_{T} +2.575 \frac{s_{\mathbf{W},T}}{\sqrt{T}}\right) & \text{at 99% c.l.} \end{array} \] and there is evidence against the null hypothesis \(H_{0}:\mu_{\mathbf{W}}=0\) when \[ \begin{array} [c]{ll} \left\vert \frac{\bar{w}_{T}\sqrt{T}}{s_{\mathbf{W},T}}\right\vert >1.645\Leftrightarrow\mathbf{P}\left(\left\vert Z\right\vert \geq\left\vert \frac{\bar{w}_{T}\sqrt{T}}{s_{\mathbf{W},T}}\right\vert \right)<0.1 & \text{at nearly 10% s.l.}\\ \left\vert \frac{\bar{w}_{T}\sqrt{T}}{s_{\mathbf{W},T}}\right\vert >1.960\Leftrightarrow\mathbf{P}\left(\left\vert Z\right\vert \geq\left\vert \frac{\bar{w}_{T}\sqrt{T}}{s_{\mathbf{W},T}}\right\vert \right)<0.05 & \text{at nearly 5% s.l.}\\ \left\vert \frac{\bar{w}_{T}\sqrt{T}}{s_{\mathbf{W},T}}\right\vert >2.575\Leftrightarrow\mathbf{P}\left(\left\vert Z\right\vert \geq\left\vert \frac{\bar{w}_{T}\sqrt{T}}{s_{\mathbf{W},T}}\right\vert \right)<0.01 & \text{at nearly 1% s.l.}. \end{array} \]

Proposition 3.10 (Time Variance of a GSWN) In case \(N=1\), assume that \(\mathbf{W}\) is Gaussian, that is \(W\sim GWN\left(\sigma_{\mathbf{W}}^{2}\right)\). Then a confidence interval for \(\sigma_{\mathbf{W}}^{2}\), at the confidence level of \(100\left(1-\alpha\right)\%\), for any \(\alpha\in\left(0,1\right)\), is given by \[ \left(\frac{\left(T-1\right)S_{\mathbf{W},T}^{2}}{\chi_{T-1,\alpha/2,u}^{2}},\ \frac{\left(T-1\right)S_{\mathbf{W},T}^{2}}{\chi_{T-1,\alpha/2,\ell}^{2}}\right), \tag{3.59} \] where \(\chi_{T-1,\alpha/2,u}^{2}\) [resp. \(\chi_{T-1,\alpha/2,\ell}^{2}\)] is the upper [resp. lower] \(\alpha/2\)-critical value of the chi-square distribution with \(T-1\) degrees of freedom and \(S_{\mathbf{W},T}^{2}\) is the time variance of \(\mathbf{W}\). Hence, a realization of the confidence interval (3.59) is given by \[ \left(\frac{\left(T-1\right)s_{\mathbf{W},T}^{2}}{\chi_{T-1,\alpha/2,u}^{2}},\ \frac{\left(T-1\right)s_{\mathbf{W},T}^{2}}{\chi_{T-1,\alpha/2,\ell}^{2}}\right) \tag{3.60} \] where \(s_{\mathbf{W},T}^{2}\) is the realization of the time variance \(S_{\mathbf{W},T}^{2}\). In addition, there is evidence against the null hypothesis \(H_{0}:\sigma_{\mathbf{W}}=\sigma\) at the significance level of \(100\alpha\%\), for any \(\alpha \in\left(0,1\right)\), when \[\begin{equation} \frac{\left(T-1\right)s^{2}_{\mathbf{W},T}}{\sigma^{2}}<\chi_{T-1,\alpha/2,\ell}^{2} \text{ or }\frac{\left(T-1\right)s^{2}_{\mathbf{W},T}}{\sigma^{2}}>\chi_{T-1,\alpha/2,u}^{2} \tag{3.61} \end{equation}\] equivalently \[\begin{equation} \min\left\{\mathbf{P}\left(\chi_{T-1}^{2}<\frac{\left(T-1\right)s^{2}_{\mathbf{W},T}}{\sigma^{2}}\right), \mathbf{P}\left(\chi_{T-1}^{2}>\frac{\left(T-1\right)s^{2}_{\mathbf{W},T}}{\sigma^{2}}\right)\right\} <\alpha/2, \tag{3.62} \end{equation}\] where \(\chi_{T-1}^{2}\) is the standard Chi-square random variable with \(T-1\) degree of freedom.

Note that a Chi-square test is rather sensitive to deviations from the Gaussian distribution. Unlike the Student distribution, the Chi-square distribution is not robust to deviation from normality of the population distribution. If the white noise distribution is not Gaussian or close enough to Gaussian, possibly the null hypothesis will be mistakenly rejected.

Proposition 3.11 (Time Variance of a SWN) In case \(N=1\), assume that \(T\) is “large” and \(\mathbf{W}\) is a process of order \(4\). Then an approximate confidence interval for \(\sigma_{\mathbf{W}}^{2}\), at the confidence level of \(100\left(1-\alpha\right)\%\), for any \(\alpha\in\left(0,1\right)\), is given by \[\begin{equation} \left(\frac{S^{2}_{\mathbf{W},T}}{1+z_{\alpha/2}\sqrt{\left(Kurt_{\mathbf{W},T}-\frac{T-3}{T-1}\right)/T}}, \frac{S^{2}_{\mathbf{W},T}}{1-z_{\alpha/2}\sqrt{\left(Kurt_{\mathbf{W},T}\left(W\right)-\frac{T-3}{T-1}\right)/T}}\right), \tag{3.63} \end{equation}\] where \(z_{\alpha/2}\) is the upper tail critical value of level \(\alpha/2\) of the standard Gaussian random variable and \(S^{2}_{\mathbf{W},T}\) [resp. \(Kurt_{\mathbf{W},T}\)] is the time variance [resp. (biased) time standardized kurtosis] of \(\mathbf{W}\). Hence, an approximate realization of the confidence interval (3.63) is given by \[\begin{equation} \left(\frac{s^{2}_{\mathbf{W},T}}{1-z_{\alpha/2}\sqrt{\left(kurt_{\mathbf{W},T}-1\right)/T}}, \frac{s^{2}_{\mathbf{W},T}}{1+z_{\alpha/2}\sqrt{\left(kurt_{\mathbf{W},T}\left(W\right)-1\right)/T}}\right), \tag{3.64} \end{equation}\] where \(s^{2}_{\mathbf{W},T}\) [resp. \(kurt_{\mathbf{W},T}\)] is the realization of the time variance \(S^{2}_{\mathbf{W},T}\) [resp. (biased) time standardized kurtosis \(Kurt_{\mathbf{W},T}\)] of \(\mathbf{W}\). In addition, there is evidence against the null hypothesis \(H_{0}:\sigma_{\mathbf{W}}^{2}=\sigma^{2}\) at the approximate significance level of \(100\alpha\%\), for any \(\alpha\in\left(0,1\right)\), when \[\begin{equation} \left\vert \frac{s^{2}_{\mathbf{W},T}-\sigma^{2}}{\sigma^{2} \sqrt{\left(kurt_{\mathbf{W},T}-1\right)/T}}\right\vert >z_{\alpha/2}\Leftrightarrow\mathbf{P}\left(\left\vert Z\right\vert \geq\left\vert \frac{s^{2}_{\mathbf{W},T}-\sigma^{2}}{\sigma^{2} \sqrt{\left(kurt_{\mathbf{W},T}-1\right)/T}}\right\vert\right) <\alpha, \tag{3.65} \end{equation}\] where \(Z\sim N\left(0,1\right)\).

3.2 Prediction of Future States and Prediction Intervals

Let \(\left(W_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{W}\) be an \(N\)-variate strong white noise on a probability space \(\left(\Omega,\mathcal{E},\mathbf{P}\right)\equiv\Omega\). For sake of simplicity, assume that \(\mathbb{T}\equiv\mathbb{N}_{0}\) or \(\mathbb{T}\equiv\mathbb{Z}\). Write \(\mu_{\mathbf{W}}\) [resp. \(\Sigma_{\mathbf{W}}^{2}\)] for the constant value of the mean [resp. variance-covariance] function of \(\mathbf{W}\). For any \(S,T\in \mathbb{N}\), write \(W_{T+S}\) for the \(S\)th future state of the process \(\mathbf{W}\) with respect to the current state \(W_{T}\) and write \(\hat{W}_{T+S\mid T}\) for the minimum square error predictor of the \(S\)th future state of the process \(\mathbf{W}\), given the information \(\mathcal{F}_{T}\equiv \sigma \left(W_{0},W_{1},\dots ,W_{T}\right)\) generated by the process \(\mathbf{W}\) itself up to the time \(T\) included.

Proposition 3.12 (Future State Predictor of a SWN) The time average estimator \(\bar{W}_{T}\) is a point estimator for \(W_{T+S}\), for every \(T,S\in\mathbb{N}\).

Proof. Writing \(\mathbf{E}\left[\cdot\mid\mathcal{F}_{T}\right]\) for the conditional expectation operator given the information \(\mathcal{F}_{T}\), we know that \[\begin{equation} \hat{W}_{T+S\mid T}\overset{\text{def}}{=} \underset{Y\in L^{2}\left(\Omega_{\mathcal{F}_{T}};\mathbb{R}^{N}\right)} {\arg\min}\left\{\mathbf{E}\left[\left(W_{T+S}-Y\right)^{2}\right]\right\} =\mathbf{E}\left[W_{T+S}\mid \mathcal{F}_{T}\right], \tag{3.66} \end{equation}\] for all \(S,T\in \mathbb{N}\). Now, since the \(N\)-variate random variables in \(\mathbf{W}\) are independent and the mean function of \(\mathbf{W}\) is constant, we have \[\begin{equation} \mathbf{E}\left[W_{T+S}\mid\mathcal{F}_{T}\right]=\mathbf{E}\left[ W_{T+S}\right]=\mu_{\mathbf{W}}, \tag{3.67} \end{equation}\] for all \(S,T\in\mathbb{N}\). Combining (3.66) and (3.67), we obtain \[\begin{equation} \hat{W}_{T+S\mid T}=\mu_{\mathbf{W}}, \tag{3.68} \end{equation}\] for all \(S,T\in \mathbb{N}\). On the other hand, \(\bar{W}_{T}\) is a point estimator for \(\mu _{\mathbf{W}}\) and the desired claim follows.

Proposition 3.13 (Predition Intervals for Future States of a GSWN) In case \(N=1\), assume that \(\mathbf{W}\) is Gaussian, \(\mathbf{W}\sim GWN\left(\sigma_{\mathbf{W}}^{2}\right)\), for some \(\sigma_{\mathbf{W}}>0\). Then, a prediction interval for the state \(W_{T+S}\), for any \(S\in\mathbb{N}\), at the confidence level of \(100\left(1-\alpha\right)\%\), for any \(\alpha\in\left(0,1\right)\), is given by \[\begin{equation} \left(\bar{W}_{T}-t_{T-1,\alpha/2}S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}},\ \bar{W}_{T}+t_{T-1,\alpha/2}S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}}\right), \tag{3.69} \end{equation}\] where \(t_{T-1,\alpha/2}\) is the upper tail critical value of level \(\alpha/2\) of the standard Student \(t_{T-1}\) distribution with \(T-1\) degree of freedom and \(S_{\mathbf{W},T}\equiv\sqrt{S_{\mathbf{W},T}^{2}}\) is the time standard deviation of \(\mathbf{W}\). Hence, a realization of the prediction interval (3.69) is given by \[\begin{equation} \left(\bar{w}_{T}-t_{T-1,\alpha/2}s_{\mathbf{W},T}\sqrt{1+\frac{1}{T}},\ \bar{w}_{T}+t_{T-1,\alpha/2}s_{\mathbf{W},T}\sqrt{1+\frac{1}{T}}\right), \tag{3.70} \end{equation}\] where \(\bar{w}_{T}\) [resp. \(s_{\mathbf{W},T}\)] is the realization of the time average estimator \(\bar{W}_{T}\) [resp. time standard deviation \(S_{\mathbf{W},T}\)] of \(\mathbf{W}\).

Proof. By virtue of Proposition 3.12 and the Gaussianity assumption on \(\mathbf{W}\), the statistic \[\begin{equation} \frac{W_{T+S}-\hat{W}_{T+S\mid T}}{S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}}} =\frac{W_{T+S}-\bar{W}_{T}}{S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}}} \equiv X \end{equation}\] has the standard Student \(t_{T-1}\) distribution with \(T-1\) degrees of freedom. In fact, since \(\mathbf{W}\) is a Gaussian strong white noise, the random variable \[ \left(W_{T+S}-\bar{W}_{T}\right)/\sigma_{\mathbf{W}}\sqrt{1+\frac{1}{T}}\equiv Z \] is Gaussian distributed. Moreover, we have \[\begin{equation} \mathbf{E}\left[Z\right] =\frac{\mathbf{E}\left[W_{T+S}\right]-\mathbf{E}\left[\bar{W}_{T}\right]} {\sigma_{\mathbf{W}}\sqrt{1+\frac{1}{T}}} =0 \end{equation}\] and \[\begin{equation} \mathbf{D}^{2}\left[Z\right] =\frac{\mathbf{D}^{2}\left[W_{T+S}\right]+\mathbf{D}^{2}\left[\bar{W}_{T}\right]} {\sigma_{\mathbf{W}}^{2}\left(1+\frac{1}{T}\right)} =\frac{\sigma_{\mathbf{W}}^{2}+\frac{1}{T}\sigma_{\mathbf{W}}^{2}}{\sigma_{\mathbf{W}}^{2}\left(1+\frac{1}{T}\right)} =1. \end{equation}\] That is \(Z\sim N(0,1)\), On the other hand, the random variable \(\left(T-1\right)S_{\mathbf{W},T}^{2}/\sigma_{\mathbf{W}}^{2}\equiv Y\) has the standard Chi-square distribution \(\chi^{2}_{T-1}\) with \(T-1\) degrees of freedom. It follows that the statistic \[\begin{equation} \frac{Z}{\sqrt(Y/\left(T-1\right))}\equiv\frac{\frac{W_{T+S}-\bar{W}_{T}}{\sigma_{\mathbf{W}}\sqrt{1+\frac{1}{T}}}} {\sqrt{\left(T-1\right)\frac{S_{\mathbf{W},T}^{2}}{\sigma_{\mathbf{W}}^{2}}/\left(T-1\right)}} = X \end{equation}\] has the standard Student \(t_{T-1}\) distribution with \(T-1\) degrees of freedom. As a consequence, we can write \[\begin{equation} \mathbf{P}\left(\left\vert\frac{W_{T+S}-\hat{W}_{T+S\mid T}}{S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}}}\right\vert <t_{T-1,\alpha/2}\right)<1-\alpha, \end{equation}\] where \(t_{T-1,\alpha/2}\) is the upper tail critical value of level \(\alpha/2\) of the standard Student \(t_{T-1}\) distribution with \(T-1\) degree of freedom. Now, we have \[\begin{align} \left\vert\frac{W_{T+S}-\hat{W}_{T+S\mid T}}{S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}}}\right\vert &<t_{T-1,\alpha/2} \Leftrightarrow -t_{T-1,\alpha/2}<\frac{W_{T+S}-\hat{W}_{T+S\mid T}}{S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}}}<t_{T-1,\alpha/2}\\ & \Leftrightarrow\hat{W}_{T+S\mid T}-t_{T-1,\alpha/2}S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}} <W_{T+S}< \hat{W}_{T+S\mid T}+t_{T-1,\alpha/2}S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}}\\ & \Leftrightarrow\bar{W}_{T}-t_{T-1,\alpha/2}S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}} <W_{T+S}< \bar{W}_{T}+t_{T-1,\alpha/2}S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}} \end{align}\] Therefore, \[\begin{equation} \mathbf{P}\left(\bar{W}_{T}-t_{T-1,\alpha/2}S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}}<W_{T+S}< \bar{W}_{T}+t_{T-1,\alpha/2}S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}}\right)<1-\alpha, \end{equation}\] which shows that (3.67) is a prediction interval for the state \(W_{T+S}\), for any \(S\in\mathbb{N}\), at the confidence level of \(100\left(1-\alpha\right)\%\), for any \(\alpha\in\left(0,1\right)\).

3.3 Application to MARWS time series

As shown above, we cannot reject the null hypothesis that the residual component of the MARWS_log time series are the sample path of a Gaussian white noise. Therefore, we apply the residual component of the MARWS_log time series the formulas for the prediction of future state and the determination of prediction intervals.

length <- nrow(MARWS_df)
TrnS_length <- floor(length*0.9)
forecast_lenght <- length-TrnS_length
mu <- 0.00
y_res <- MARWS_log_dec_df$y_res
y_res_mean <- 0
y_res_sd <- sd(y_res)
alpha <- c(0.1, 0.05, 0.01)
z_val <- qnorm(alpha/2, lower.tail = FALSE)
y_res_conf_int_low <- y_res_mean - z_val*y_res_sd
y_res_conf_int_upp <- y_res_mean + z_val*y_res_sd
y_res_conf_int <- c(y_res_conf_int_low[3],y_res_conf_int_low[2],y_res_conf_int_low[1],
                    y_res_conf_int_upp[1],y_res_conf_int_upp[2],y_res_conf_int_upp[3])
T <- length(y_res)
t_val <- qt(alpha/2, df=(T-1), lower.tail = FALSE)
y_res_pred_int_low <- y_res_mean - t_val*y_res_sd*sqrt(1+1/T)
y_res_pred_int_upp <- y_res_mean + t_val*y_res_sd*sqrt(1+1/T)
y_res_pred_int <- c(y_res_pred_int_low[3],y_res_pred_int_low[2],y_res_pred_int_low[1],y_res_pred_int_upp[1],
                    y_res_pred_int_upp[2],y_res_pred_int_upp[3])
# We set the prediction length and build the prediction data frame
y_res_ext_vec <- c(y_res,rep(NA,forecast_lenght))
set.seed(12345)
y_res_ext_12345_vec  <- c(rep(NA,TrnS_length), rnorm(mean = 0, sd = y_res_sd, forecast_lenght))
set.seed(23451)
y_res_ext_23451_vec  <- c(rep(NA,TrnS_length), rnorm(mean = 0, sd = y_res_sd, forecast_lenght))
set.seed(34512)
y_res_ext_34512_vec  <- c(rep(NA,TrnS_length), rnorm(mean = 0, sd = y_res_sd, forecast_lenght))
y_res_conf_int_mat <- t(matrix(cbind(rep(y_res_conf_int, times=TrnS_length)), nrow=6))
y_res_pred_vec  <- c(rep(NA,TrnS_length), rep(y_res_mean, times=forecast_lenght))
y_res_pred_int_mat <- t(matrix(cbind(rep(y_res_pred_int, times=forecast_lenght)), nrow=6))
# We build the object y_res_pred_ts of class "ts" from the object y_res_pred_vec of class "numeric" to generate
# years and months for the prediction data frame
y_res_pred_ts <- ts(y_res_pred_vec, start=c(1980,1), frequency=12)
# We build the prediction data frame
y_res_pred_df <- data.frame(t=1:length(y_res_pred_ts), Year = trunc(time(y_res_pred_ts)), 
                            Month=month.abb[cycle(y_res_pred_ts)],
                            y_res_ext=y_res_ext_vec,
                            y_res_ext_12345=y_res_ext_12345_vec,
                            y_res_ext_23451=y_res_ext_23451_vec,
                            y_res_ext_34512=y_res_ext_34512_vec,
                            y_res_pred=as.vector(y_res_pred_ts),
                            y_res_conf_99_int_low=c(y_res_conf_int_mat[,1],rep(NA,times=forecast_lenght)),
                            y_res_conf_95_int_low=c(y_res_conf_int_mat[,2],rep(NA,times=forecast_lenght)),
                            y_res_conf_90_int_low=c(y_res_conf_int_mat[,3],rep(NA,times=forecast_lenght)),
                            y_res_conf_90_int_upp=c(y_res_conf_int_mat[,4],rep(NA,times=forecast_lenght)),
                            y_res_conf_95_int_upp=c(y_res_conf_int_mat[,5],rep(NA,times=forecast_lenght)),
                            y_res_conf_99_int_upp=c(y_res_conf_int_mat[,6],rep(NA,times=forecast_lenght)),
                            y_res_pred_99_int_low=c(rep(NA,times=T),y_res_pred_int_mat[,1]),
                            y_res_pred_95_int_low=c(rep(NA,times=T),y_res_pred_int_mat[,2]),
                            y_res_pred_90_int_low=c(rep(NA,times=T),y_res_pred_int_mat[,3]),
                            y_res_pred_90_int_upp=c(rep(NA,times=T),y_res_pred_int_mat[,4]),
                            y_res_pred_95_int_upp=c(rep(NA,times=T),y_res_pred_int_mat[,5]),
                            y_res_pred_99_int_upp=c(rep(NA,times=T),y_res_pred_int_mat[,6]))
head(y_res_pred_df)
##   t Year Month   y_res_ext y_res_ext_12345 y_res_ext_23451 y_res_ext_34512
## 1 1 1980   Jan -0.07074295              NA              NA              NA
## 2 2 1980   Feb  0.01625797              NA              NA              NA
## 3 3 1980   Mar -0.11930958              NA              NA              NA
## 4 4 1980   Apr  0.01138457              NA              NA              NA
## 5 5 1980   May  0.10867305              NA              NA              NA
## 6 6 1980   Jun  0.03187805              NA              NA              NA
##   y_res_pred y_res_conf_99_int_low y_res_conf_95_int_low y_res_conf_90_int_low
## 1         NA            -0.2409383            -0.1833314            -0.1538566
## 2         NA            -0.2409383            -0.1833314            -0.1538566
## 3         NA            -0.2409383            -0.1833314            -0.1538566
## 4         NA            -0.2409383            -0.1833314            -0.1538566
## 5         NA            -0.2409383            -0.1833314            -0.1538566
## 6         NA            -0.2409383            -0.1833314            -0.1538566
##   y_res_conf_90_int_upp y_res_conf_95_int_upp y_res_conf_99_int_upp
## 1             0.1538566             0.1833314             0.2409383
## 2             0.1538566             0.1833314             0.2409383
## 3             0.1538566             0.1833314             0.2409383
## 4             0.1538566             0.1833314             0.2409383
## 5             0.1538566             0.1833314             0.2409383
## 6             0.1538566             0.1833314             0.2409383
##   y_res_pred_99_int_low y_res_pred_95_int_low y_res_pred_90_int_low
## 1                    NA                    NA                    NA
## 2                    NA                    NA                    NA
## 3                    NA                    NA                    NA
## 4                    NA                    NA                    NA
## 5                    NA                    NA                    NA
## 6                    NA                    NA                    NA
##   y_res_pred_90_int_upp y_res_pred_95_int_upp y_res_pred_99_int_upp
## 1                    NA                    NA                    NA
## 2                    NA                    NA                    NA
## 3                    NA                    NA                    NA
## 4                    NA                    NA                    NA
## 5                    NA                    NA                    NA
## 6                    NA                    NA                    NA
tail(y_res_pred_df,20)
##       t Year Month  y_res_ext y_res_ext_12345 y_res_ext_23451 y_res_ext_34512
## 168 168 1993   Dec 0.05357952              NA              NA              NA
## 169 169 1994   Jan         NA      0.05476928     0.114140088     0.169968535
## 170 170 1994   Feb         NA      0.06636214     0.047855090     0.051597175
## 171 171 1994   Mar         NA     -0.01022403     0.027438080     0.103956003
## 172 172 1994   Apr         NA     -0.04241929     0.038453982     0.012258932
## 173 173 1994   May         NA      0.05667359    -0.205110609    -0.091667775
## 174 174 1994   Jun         NA     -0.17004824     0.227376246     0.017859514
## 175 175 1994   Jul         NA      0.05893825     0.016818878    -0.037477542
## 176 176 1994   Aug         NA     -0.02583375     0.085028411    -0.001111783
## 177 177 1994   Sep         NA     -0.02657978    -0.007284711    -0.027724435
## 178 178 1994   Oct         NA     -0.08599168     0.025993101    -0.119136097
## 179 179 1994   Nov         NA     -0.01087360    -0.044090599     0.027774277
## 180 180 1994   Dec         NA      0.16998801    -0.014105764     0.079678460
## 181 181 1995   Jan         NA      0.03466785    -0.066026736     0.055218817
## 182 182 1995   Feb         NA      0.04866009    -0.045708894     0.184725173
## 183 183 1995   Mar         NA     -0.07020337     0.149419724    -0.125327031
## 184 184 1995   Apr         NA      0.07641130    -0.090052977    -0.073074748
## 185 185 1995   May         NA     -0.08290824     0.054215114     0.011069914
## 186 186 1995   Jun         NA     -0.03101515    -0.046092023    -0.098917801
## 187 187 1995   Jul         NA      0.10482939    -0.008890516     0.070145036
##     y_res_pred y_res_conf_99_int_low y_res_conf_95_int_low
## 168         NA            -0.2409383            -0.1833314
## 169          0                    NA                    NA
## 170          0                    NA                    NA
## 171          0                    NA                    NA
## 172          0                    NA                    NA
## 173          0                    NA                    NA
## 174          0                    NA                    NA
## 175          0                    NA                    NA
## 176          0                    NA                    NA
## 177          0                    NA                    NA
## 178          0                    NA                    NA
## 179          0                    NA                    NA
## 180          0                    NA                    NA
## 181          0                    NA                    NA
## 182          0                    NA                    NA
## 183          0                    NA                    NA
## 184          0                    NA                    NA
## 185          0                    NA                    NA
## 186          0                    NA                    NA
## 187          0                    NA                    NA
##     y_res_conf_90_int_low y_res_conf_90_int_upp y_res_conf_95_int_upp
## 168            -0.1538566             0.1538566             0.1833314
## 169                    NA                    NA                    NA
## 170                    NA                    NA                    NA
## 171                    NA                    NA                    NA
## 172                    NA                    NA                    NA
## 173                    NA                    NA                    NA
## 174                    NA                    NA                    NA
## 175                    NA                    NA                    NA
## 176                    NA                    NA                    NA
## 177                    NA                    NA                    NA
## 178                    NA                    NA                    NA
## 179                    NA                    NA                    NA
## 180                    NA                    NA                    NA
## 181                    NA                    NA                    NA
## 182                    NA                    NA                    NA
## 183                    NA                    NA                    NA
## 184                    NA                    NA                    NA
## 185                    NA                    NA                    NA
## 186                    NA                    NA                    NA
## 187                    NA                    NA                    NA
##     y_res_conf_99_int_upp y_res_pred_99_int_low y_res_pred_95_int_low
## 168             0.2409383                    NA                    NA
## 169                    NA            -0.2444463            -0.1852185
## 170                    NA            -0.2444463            -0.1852185
## 171                    NA            -0.2444463            -0.1852185
## 172                    NA            -0.2444463            -0.1852185
## 173                    NA            -0.2444463            -0.1852185
## 174                    NA            -0.2444463            -0.1852185
## 175                    NA            -0.2444463            -0.1852185
## 176                    NA            -0.2444463            -0.1852185
## 177                    NA            -0.2444463            -0.1852185
## 178                    NA            -0.2444463            -0.1852185
## 179                    NA            -0.2444463            -0.1852185
## 180                    NA            -0.2444463            -0.1852185
## 181                    NA            -0.2444463            -0.1852185
## 182                    NA            -0.2444463            -0.1852185
## 183                    NA            -0.2444463            -0.1852185
## 184                    NA            -0.2444463            -0.1852185
## 185                    NA            -0.2444463            -0.1852185
## 186                    NA            -0.2444463            -0.1852185
## 187                    NA            -0.2444463            -0.1852185
##     y_res_pred_90_int_low y_res_pred_90_int_upp y_res_pred_95_int_upp
## 168                    NA                    NA                    NA
## 169            -0.1551746             0.1551746             0.1852185
## 170            -0.1551746             0.1551746             0.1852185
## 171            -0.1551746             0.1551746             0.1852185
## 172            -0.1551746             0.1551746             0.1852185
## 173            -0.1551746             0.1551746             0.1852185
## 174            -0.1551746             0.1551746             0.1852185
## 175            -0.1551746             0.1551746             0.1852185
## 176            -0.1551746             0.1551746             0.1852185
## 177            -0.1551746             0.1551746             0.1852185
## 178            -0.1551746             0.1551746             0.1852185
## 179            -0.1551746             0.1551746             0.1852185
## 180            -0.1551746             0.1551746             0.1852185
## 181            -0.1551746             0.1551746             0.1852185
## 182            -0.1551746             0.1551746             0.1852185
## 183            -0.1551746             0.1551746             0.1852185
## 184            -0.1551746             0.1551746             0.1852185
## 185            -0.1551746             0.1551746             0.1852185
## 186            -0.1551746             0.1551746             0.1852185
## 187            -0.1551746             0.1551746             0.1852185
##     y_res_pred_99_int_upp
## 168                    NA
## 169             0.2444463
## 170             0.2444463
## 171             0.2444463
## 172             0.2444463
## 173             0.2444463
## 174             0.2444463
## 175             0.2444463
## 176             0.2444463
## 177             0.2444463
## 178             0.2444463
## 179             0.2444463
## 180             0.2444463
## 181             0.2444463
## 182             0.2444463
## 183             0.2444463
## 184             0.2444463
## 185             0.2444463
## 186             0.2444463
## 187             0.2444463

We plot the real and predicted path of the residual component of the MARWS_log time series and the corresponding confidence and prediction intervals.

Data_df <- y_res_pred_df
  length <- nrow(Data_df)
  T <- length(y_res)
  mu <- round(mean(y_res), digits=3)
  sigma <- round(sd(y_res), digits=3)
  First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
  Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
  title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Line Plot of the Predicted Residual Component of the *MARWS_log* Time Series from ", .(First_Date), " to ", .(Last_Date), sep="")))
  subtitle_content <- bquote(paste("path length ", .(length), " sample points,  estimated mean ", mu[W]==.(mu), ",  estimated stand. dev. ", sigma[W]==.(sigma), sep=""))
  caption_content <- "Author: Roberto Monte"
  x_name <- bquote("")
  x_breaks_num <- 30
  x_breaks_low <- Data_df$t[1]
  x_breaks_up <- Data_df$t[length]
  x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
  x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
  if((max(x_breaks)-x_breaks_up)>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
  x_labs <- paste(Data_df$Month[x_breaks],Data_df$Year[x_breaks])
  J <- 0
  x_lims <- c(x_breaks_low-J*x_binwidth, x_breaks_up+J*x_binwidth)
  y_name <- bquote("sales (kliters)")
  y_breaks_num <- 10
  y_max <- max(na.omit(Data_df$y_res_ext),na.omit(Data_df$y_res_ext_12345),na.omit(Data_df$y_res_ext_23451),
               na.omit(Data_df$y_res_ext_34512), na.omit(Data_df$y_res_conf_99_int_upp), na.omit(Data_df$y_res_pred_99_int_upp))
  y_min <- min(na.omit(Data_df$y_res_ext),na.omit(Data_df$y_res_ext_12345),na.omit(Data_df$y_res_ext_23451),
               na.omit(Data_df$y_res_ext_34512), na.omit(Data_df$y_res_conf_99_int_low), na.omit(Data_df$y_res_pred_99_int_low))
  y_binwidth <- round((y_max-y_min)/y_breaks_num, digits=3)
  y_breaks_low <- floor(y_min/y_binwidth)*y_binwidth
  y_breaks_up <- ceiling(y_max/y_binwidth)*y_binwidth
  y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),digits=3)
  y_labs <- format(y_breaks, scientific=FALSE)
  K <- 0
  y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
  line_black     <- bquote("in-sample path")
  line_brown     <- bquote("predicted path")
  line_cyan      <- bquote("12345 - out-of-sample path")
  line_slateblue <- bquote("23451 - out-of-sample path")
  line_orangered <- bquote("34512 - out-of-sample path")
  line_green   <- bquote("90% pred.int.")
  line_blue    <- bquote("95% pred.int.")
  line_red     <- bquote("99% pred.int.")
  leg_line_labs   <- c(line_black, line_brown, line_cyan, line_slateblue, line_orangered, line_green, line_blue, line_red)
  leg_line_breaks <- c("line_black", "line_brown", "line_cyan", "line_slateblue", "line_orangered", "line_green", 
                       "line_blue", "line_red")
  leg_line_cols   <- c("line_black"="black", "line_brown"="brown", "line_cyan"="cyan", "line_slateblue"="slateblue",
                       "line_orangered"="orangered", "line_green"="green", "line_blue"="blue", "line_red"="red")
  fill_grey20 <- bquote("90% conf. band")
  fill_grey40 <- bquote("95% conf. band")
  fill_grey60 <- bquote("99% conf. band")
  fill_g <- bquote("90% pred. band")
  fill_b <- bquote("95% pred. band")
  fill_r <- bquote("99% pred. band")
  leg_fill_labs   <- c(fill_grey20, fill_grey40, fill_grey60, fill_g, fill_b, fill_r)
  leg_fill_breaks <- c("fill_grey20", "fill_grey40", "fill_grey60", "fill_g", "fill_b", "fill_r")
  leg_fill_cols   <- c("fill_grey20"="grey40", "fill_grey40"="grey60", "fill_grey60"="grey80",
                       "fill_g"="lightgreen", "fill_b"="blue", "fill_r"="orangered")
  leg_col_labs    <- leg_line_labs
  leg_col_breaks  <- leg_line_breaks
  leg_col_cols    <- leg_line_cols
  y_res_pred_lp <- ggplot(Data_df, aes(x=t)) + 
    geom_line(data=subset(Data_df, Data_df$t <= t[T+1]), aes(y=y_res_ext, color="line_black"),
              linetype="solid", alpha=1, size=1, group=1) +
    geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_res_ext_12345, color="line_cyan"),
              linetype="solid", alpha=1, size=1, group=1) +
    geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_res_ext_23451, color="line_slateblue"),
              linetype="solid", alpha=1, size=1, group=1) +
    geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_res_ext_34512, color="line_orangered"),
              linetype="solid", alpha=1, size=1, group=1) +
    geom_line(data=subset(Data_df, Data_df$t >= t[T+1]),
              aes(y=y_res_pred, colour="line_brown"), linetype="solid", alpha=1, size=1) +
    geom_line(data=subset(Data_df, Data_df$t >= t[T+1]),
              aes(y=y_res_pred_99_int_low, colour="line_red"), linetype="solid", alpha=1, size=1) +
    geom_line(data=subset(Data_df, Data_df$t >= t[T+1]),
              aes(y=y_res_pred_99_int_upp, colour="line_red"), linetype="solid", alpha=1, size=1) +
    geom_line(data=subset(Data_df, Data_df$t >= t[T+1]),
              aes(y=y_res_pred_95_int_low, colour="line_blue"), linetype="solid", alpha=1, size=1) +
    geom_line(data=subset(Data_df, Data_df$t >= t[T+1]),
              aes(y=y_res_pred_95_int_upp, colour="line_blue"), linetype="solid", alpha=1, size=1) +
    geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), 
              aes(y=y_res_pred_90_int_low, colour="line_green"), linetype="solid", alpha=1, size=1) +
    geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), 
              aes(y=y_res_pred_90_int_upp, colour="line_green"), linetype="solid", alpha=1, size=1) +
    geom_ribbon(data=subset(Data_df, Data_df$t <= t[T]), alpha=0.3, colour="grey80",
                aes(ymin=y_res_conf_99_int_low, ymax=y_res_conf_95_int_low, fill="fill_grey60")) +
    geom_ribbon(data=subset(Data_df, Data_df$t <= t[T]), alpha=0.3, colour="grey80",
                aes(ymin=y_res_conf_95_int_upp, ymax=y_res_conf_99_int_upp, fill="fill_grey60")) +
    geom_ribbon(data=subset(Data_df, Data_df$t <= t[T]), alpha=0.3, colour="grey60",
                aes(ymin=y_res_conf_95_int_low, ymax=y_res_conf_90_int_low, fill="fill_grey40")) +
    geom_ribbon(data=subset(Data_df, Data_df$t <= t[T]), alpha=0.3, colour="grey60",
                aes(ymin=y_res_conf_90_int_upp, ymax=y_res_conf_95_int_upp, fill="fill_grey40")) +
    geom_ribbon(data=subset(Data_df, Data_df$t <= t[T]), alpha=0.3, colour="grey40",
                aes(ymin=y_res_conf_90_int_low, ymax=y_res_conf_90_int_upp, fill="fill_grey20")) +
    geom_ribbon(data=subset(Data_df, Data_df$t >= t[T+1]), alpha=0.3, colour="orangered",
                aes(ymin=y_res_pred_99_int_low, ymax=y_res_pred_95_int_low, fill="fill_r")) +
    geom_ribbon(data=subset(Data_df, Data_df$t >= t[T+1]), alpha=0.3, colour="orangered",
                aes(ymin=y_res_pred_95_int_upp, ymax=y_res_pred_99_int_upp, fill="fill_r")) +
    geom_ribbon(data=subset(Data_df, Data_df$t >= t[T+1]), alpha=0.3, colour="blue",
                aes(ymin=y_res_pred_95_int_low, ymax=y_res_pred_90_int_low, fill="fill_b")) +
    geom_ribbon(data=subset(Data_df, Data_df$t >= t[T+1]), alpha=0.3, colour="blue",
                aes(ymin=y_res_pred_90_int_upp, ymax=y_res_pred_95_int_upp, fill="fill_b")) +
    geom_ribbon(data=subset(Data_df, Data_df$t >= t[T+1]), alpha=0.3, colour="lightgreen",
                aes(ymin=y_res_pred_90_int_low, ymax=y_res_pred_90_int_upp, fill="fill_g")) +
    scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_labs, limits=x_lims) +
    scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                       sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
    ggtitle(title_content) +
    labs(subtitle=subtitle_content, caption=caption_content) +
    guides(linetype="none", shape="none") +
    scale_colour_manual(name="Legend", labels=leg_line_labs, values=leg_line_cols, breaks=leg_line_breaks) +
    scale_fill_manual(name="", labels=leg_fill_labs, values=leg_fill_cols, breaks=leg_fill_breaks) +
    guides(colour=guide_legend(order=1), fill=guide_legend(order=2)) +
    theme(plot.title=element_text(hjust = 0.5), 
          plot.subtitle=element_text(hjust =  0.5),
          plot.caption = element_text(hjust = 1.0),
          axis.text.x = element_text(angle=-45, vjust=1, hjust=-0.3),
          legend.key.width = unit(0.8,"cm"), legend.position="bottom")
  plot(y_res_pred_lp)
## Warning: Removed 1 row containing missing values (`geom_line()`).

So far, we have built an optimal prediction of the future states of the residual component of the MARWS_log time series and the corresponding prediction intervals. Now, we want to build a prediction of the future states of the seasonal and trend components of the MARWS_log time series. Clearly, for the latter there are not corresponding prediction intervals since deterministic components.

To build the prediction of the future states of the seasonal component of the MARWS_log time series, we just need repeat cyclically the entries of the the seasonal component itself.

  length <- nrow(MARWS_df)
  TrnS_length <- floor(length*0.9)
  forecast_length <- length-TrnS_length
  q <- forecast_length%/%12
  r <- forecast_length%%12
  Data_df <-  MARWS_log_dec_df
  y_seas <- Data_df$y_seas
  y_seas_pred_vec <- c(y_seas,rep(y_seas[c(1:12)],q),y_seas[c(1:r)])
  y_seas_pred_ts <- ts(y_seas_pred_vec, start=c(1980,1), frequency=12)
plot(y_seas_pred_ts, type="l", col="blue", xlab="date", ylab="log kilotitres", main="Prediction of the Seasonal Component of the AU Red Wine Monthly Sales (log kliters) from 01-1980 to 07-1995")

To build the prediction of the future states of the trend component of the MARWS_log time series, various strategies are possible. For instance, the naive strategy is to extend the trend component for the entire validation set, assigning it the last value calculated on the training set.

length <- nrow(MARWS_df)
TrnS_length <- floor(length*0.9)
forecast_length <- length-TrnS_length
Data_df <-  MARWS_log_dec_df
y_trend <- Data_df$y_trend
length(y_trend)
## [1] 168
y_trend_naive_ext_vec <- c(y_trend, rep(y_trend[TrnS_length],forecast_lenght))
length(y_trend_naive_ext_vec)
## [1] 187
y_trend_naive_ext_ts <- ts(y_trend_naive_ext_vec, start=c(1980,1), frequency=12)
plot( y_trend_naive_ext_ts, type="l", col="blue", xlab="date", ylab="log kilotitres", main="Naive Prediction of the Annual Avg. Trend Comp. of the AU Red Wine Monthly Sales (log kliters) from 01-1980 to 07-1995")

Another strategy is to extend the trend component by integer multiples of the average annual increase.

length <- nrow(MARWS_df)
TrnS_length <- floor(length*0.9)
forecast_length <- length-TrnS_length
Data_df <-  MARWS_log_dec_df
y_trend <- Data_df$y_trend
length(y_trend)
## [1] 168
y_ann_trend <- y_trend[c(seq(1,length(y_trend),by=12))]
y_av_jump <- mean(diff(y_ann_trend))
y_trend_aver_ext_vec <- c(y_trend, rep(y_trend[length(y_trend)]+1*y_av_jump,12), rep(y_trend[length(y_trend)]+2*y_av_jump,r))
length(y_trend_aver_ext_vec)
## [1] 187
y_trend_aver_ext_ts <- ts(y_trend_aver_ext_vec, start=c(1980,1), frequency=12)
plot(y_trend_aver_ext_ts, type="l", col="blue", xlab="date", ylab="log kilotitres", main="Avg. Prediction of the Annual Avg. Trend Comp. of the AU Red Wine Monthly Sales (log kliters) from 01-1980 to 07-1995")

A third strategy is to rely on LOESS smoothing of the annual average trend component.

length <- nrow(MARWS_df)
TrnS_length <- floor(length*0.9)
forecast_length <- length-TrnS_length
Data_df <-  MARWS_log_dec_df
t <- Data_df$t
y_trend <- Data_df$y_trend
y_trend_loess_deg_1 <- loess(y_trend~t, degree=1, family = "gaussian", control = loess.control(surface = "direct"))
y_trend_loess_deg_2 <- loess(y_trend~t, degree=2, family = "gaussian", control = loess.control(surface = "direct"))
plot(y_trend_loess_deg_1[["fitted"]], type="l", col="blue", main="LOESS Smoothing (deg 1) of the Annual Avg. Trend Comp. of the AU Red Wine Monthly Sales (log kliters) from Jan 1980 to Dec 1993")

plot(y_trend_loess_deg_2[["fitted"]], type="l", col="blue", main="LOESS Smoothing (deg 2) of the Annual Avg. Trend Comp. of the AU Red Wine Monthly Sales (log kliters) from Jan 1980 to Dec 1993")

y_trend_loess_deg_1_pred <- predict(object = y_trend_loess_deg_1, newdata = seq(TrnS_length+1, length, by=1), se = FALSE)
y_trend_loess_deg_2_pred <- predict(object = y_trend_loess_deg_2, newdata = seq(TrnS_length+1, length, by=1), se = FALSE)
y_trend_loess_deg_1_ext_vec <- c(y_trend, as.vector(y_trend_loess_deg_1_pred))
y_trend_loess_deg_2_ext_vec <- c(y_trend, as.vector(y_trend_loess_deg_2_pred))
y_trend_loess_deg_1_ext_ts <- ts(y_trend_loess_deg_1_ext_vec, start=c(1980,1), frequency=12)
y_trend_loess_deg_2_ext_ts <- ts(y_trend_loess_deg_2_ext_vec, start=c(1980,1), frequency=12)
plot(y_trend_loess_deg_1_ext_ts, type="l", col="blue", xlab="date", ylab="log kilotitres", main="LOESS (smoothing deg 1) Prediction of the Annual Avg. Trend Comp. of the AU Red Wine Monthly Sales (log kliters) from Jan 1980 to Jul 1995")

plot(y_trend_loess_deg_2_ext_ts, type="l", col="blue", xlab="date", ylab="log kilotitres", main="LOESS (smoothing deg 2) Prediction of the Annual Avg. Trend Comp. of the AU Red Wine Monthly Sales (log kliters) from Jan 1980 to Jul 1995")

As a final strategy, we combine a prediction based on the LOESS smoothing of the trend component computed by mobile average. This is similar to the LOESS prediction of the trend in stl decomposition

length <- nrow(MARWS_df)
TrnS_length <- floor(length*0.9)
forecast_length <- length-TrnS_length
Data_df <-  MARWS_log_dec_df
t <- Data_df$t
y_trend_ma_12_ts <- forecast::ma(Data_df$y, order=12, centre=TRUE)
class(y_trend_ma_12_ts)
## [1] "ts"
head(y_trend_ma_12_ts, 12)
## Time Series:
## Start = 1 
## End = 12 
## Frequency = 1 
##  [1]       NA       NA       NA       NA       NA       NA 6.839058 6.855791
##  [9] 6.876998 6.893843 6.902812 6.912373
tail(y_trend_ma_12_ts, 12)
## Time Series:
## Start = 157 
## End = 168 
## Frequency = 1 
##  [1] 7.613479 7.633439 7.644405 7.640671 7.642529 7.639271       NA       NA
##  [9]       NA       NA       NA       NA
plot(y_trend_ma_12_ts, type="l", col="blue", main="Moving Avg. (win=12) of the AU Red Wine Monthly Sales (log kliters) from Jan 1980 to Dec 1993")

y_trend_ma_12_deg_1_loess <- loess(y_trend_ma_12_ts~t, degree=1, family = "gaussian", control = loess.control(surface = "direct"))
plot( y_trend_ma_12_deg_1_loess[["fitted"]], type="l", col="blue", main="LOESS Smoothing (deg 1) of the Moving Avg. (win=12) Trend Comp. of the AU Red Wine Monthly Sales (log kliters) from Jan 1980 to Dec 1993")

y_trend_ma_12_deg_2_loess <- loess(y_trend_ma_12_ts~t, degree=2, family = "gaussian", control = loess.control(surface = "direct"))
plot(y_trend_ma_12_deg_2_loess[["fitted"]], type="l", col="blue", main="LOESS Smoothing (deg 2) of the Moving Avg. (win=12) Trend Comp. of the AU Red Wine Monthly Sales (log kliters) from Jan 1980 to Dec 1993")

y_trend_ma_12_deg_1_loess_pred_vec <- predict(object = y_trend_ma_12_deg_1_loess, newdata = seq(TrnS_length+1, length, by=1), se = FALSE)
y_trend_ma_12_deg_1_loess_ext_vec <- c(y_trend_ma_12_ts, as.vector(y_trend_ma_12_deg_1_loess_pred_vec))
y_trend_ma_12_deg_1_loess_ext_ts <- ts(y_trend_ma_12_deg_1_loess_ext_vec, start=c(1980,1), frequency=12)

y_trend_ma_12_deg_2_loess_pred_vec <- predict(object = y_trend_ma_12_deg_2_loess, newdata = seq(TrnS_length+1, length, by=1), se = FALSE)
y_trend_ma_12_deg_2_loess_ext_vec <- c(y_trend_ma_12_ts, as.vector(y_trend_ma_12_deg_2_loess_pred_vec))
y_trend_ma_12_deg_2_loess_ext_ts <- ts(y_trend_ma_12_deg_2_loess_ext_vec, start=c(1980,1), frequency=12)
plot(y_trend_ma_12_deg_1_loess_ext_ts, type="l", col="blue", xlab="date", ylab="log kilotitres", main="LOESS (deg 1) Prediction of the Moving Avg. (win=12) Trend Comp. of the AU Red Wine Monthly Sales (log kliters) from 01-1980 to 07-1995")

plot(y_trend_ma_12_deg_2_loess_ext_ts, type="l", col="blue", xlab="date", ylab="log kilotitres", main="LOESS (deg 2) Prediction of the Moving Avg. (win=12) Trend Comp. of the AU Red Wine Monthly Sales (log kliters) from 01-1980 to 07-1995")

Now, focusing our attention on the LOESS (smoothing deg 1) prediction of the annual average trend component, and optimistic scenario, we consider the sum of all predicted components.

y_res_ext_ts <- ts(c(y_res_ext_vec[1:TrnS_length],rep(0,forecast_length)), start=c(1980,1), frequency=12)
y_pred_ts <- y_trend_loess_deg_1_ext_ts + y_seas_pred_ts + y_res_ext_ts
plot(y_pred_ts, type="l", col="blue", main="AU Red Wine Mont. Sal. Prediction - LOESS Smooth. (deg 1) of the Annual Avg. Trend Comp. (log kliters) from Jan 1980 to Dec 1993")

To plot the predicted MARWS_log time series with the prediction intervals we just need to build a suitable data frame and apply again the plotting procedure shown above.

y_trend_pred_ts <- y_trend_loess_deg_1_ext_ts
y_pred_trend_seas_ts <-  y_trend_pred_ts[c((length(y_res)+1):length(y_pred_ts))]+y_seas_pred_ts[c((length(y_res)+1):length(y_pred_ts))]
y_pred_trend_seas_vec <- as.vector(c(rep(NA,times=length(y_res)),y_pred_trend_seas_ts))
y_pred_99_int_low <- y_pred_trend_seas_vec + y_res_pred_df$y_res_pred_99_int_low
y_pred_95_int_low <- y_pred_trend_seas_vec + y_res_pred_df$y_res_pred_95_int_low
y_pred_90_int_low <- y_pred_trend_seas_vec + y_res_pred_df$y_res_pred_90_int_low
y_pred_90_int_upp <- y_pred_trend_seas_vec + y_res_pred_df$y_res_pred_90_int_upp
y_pred_95_int_upp <- y_pred_trend_seas_vec + y_res_pred_df$y_res_pred_95_int_upp
y_pred_99_int_upp <- y_pred_trend_seas_vec + y_res_pred_df$y_res_pred_99_int_upp
  
y_pred_df <- data.frame(t=1:length(y_pred_ts), Year = trunc(time(y_pred_ts)), Month=month.abb[cycle(y_pred_ts)],
                        y=log(MARWS_df$RWS), y_pred=as.vector(y_pred_ts),
                        y_pred_99_int_low=y_pred_99_int_low, y_pred_95_int_low=y_pred_95_int_low,            
                        y_pred_90_int_low=y_pred_90_int_low, y_pred_90_int_upp=y_pred_90_int_upp, 
                        y_pred_95_int_upp=y_pred_95_int_upp, y_pred_99_int_upp=y_pred_99_int_upp)
head(y_pred_df)
##   t Year Month        y   y_pred y_pred_99_int_low y_pred_95_int_low
## 1 1 1980   Jan 6.139885 6.139885                NA                NA
## 2 2 1980   Feb 6.514713 6.514713                NA                NA
## 3 3 1980   Mar 6.555357 6.555357                NA                NA
## 4 4 1980   Apr 6.787845 6.787845                NA                NA
## 5 5 1980   May 7.037906 7.037906                NA                NA
## 6 6 1980   Jun 6.981935 6.981935                NA                NA
##   y_pred_90_int_low y_pred_90_int_upp y_pred_95_int_upp y_pred_99_int_upp
## 1                NA                NA                NA                NA
## 2                NA                NA                NA                NA
## 3                NA                NA                NA                NA
## 4                NA                NA                NA                NA
## 5                NA                NA                NA                NA
## 6                NA                NA                NA                NA
tail(y_pred_df,(forecast_length+1))
##       t Year Month        y   y_pred y_pred_99_int_low y_pred_95_int_low
## 168 168 1993   Dec 7.837949 7.837949                NA                NA
## 169 169 1994   Jan 6.947937 7.045745          6.801299          6.860527
## 170 170 1994   Feb 7.454720 7.336575          7.092129          7.151356
## 171 171 1994   Mar 7.696667 7.515781          7.271335          7.330563
## 172 172 1994   Apr 7.805882 7.620563          7.376117          7.435345
## 173 173 1994   May 7.698029 7.776317          7.531870          7.591098
## 174 174 1994   Jun 7.886081 7.800115          7.555669          7.614896
## 175 175 1994   Jul 8.207947 8.013926          7.769480          7.828708
## 176 176 1994   Aug 7.887959 8.014267          7.769821          7.829048
## 177 177 1994   Sep 7.878155 7.776428          7.531982          7.591209
## 178 178 1994   Oct 7.707962 7.688937          7.444491          7.503719
## 179 179 1994   Nov 7.857868 7.782219          7.537773          7.597001
## 180 180 1994   Dec 7.895063 7.849390          7.604943          7.664171
## 181 181 1995   Jan 7.077498 7.081340          6.836894          6.896122
## 182 182 1995   Feb 7.466799 7.372097          7.127651          7.186878
## 183 183 1995   Mar 7.807510 7.551233          7.306787          7.366015
## 184 184 1995   Apr 7.870166 7.655947          7.411501          7.470729
## 185 185 1995   May 7.857481 7.811635          7.567189          7.626417
## 186 186 1995   Jun 8.104703 7.835370          7.590924          7.650151
## 187 187 1995   Jul 8.274612 8.049120          7.804674          7.863902
##     y_pred_90_int_low y_pred_90_int_upp y_pred_95_int_upp y_pred_99_int_upp
## 168                NA                NA                NA                NA
## 169          6.890571          7.200920          7.230964          7.290192
## 170          7.181400          7.491749          7.521793          7.581021
## 171          7.360607          7.670956          7.701000          7.760228
## 172          7.465389          7.775738          7.805782          7.865009
## 173          7.621142          7.931491          7.961535          8.020763
## 174          7.644940          7.955289          7.985333          8.044561
## 175          7.858752          8.169101          8.199145          8.258372
## 176          7.859092          8.169441          8.199485          8.258713
## 177          7.621253          7.931602          7.961646          8.020874
## 178          7.533763          7.844112          7.874156          7.933384
## 179          7.627045          7.937394          7.967438          8.026666
## 180          7.694215          8.004564          8.034608          8.093836
## 181          6.926166          7.236515          7.266559          7.325787
## 182          7.216922          7.527271          7.557315          7.616543
## 183          7.396059          7.706408          7.736452          7.795679
## 184          7.500772          7.811122          7.841166          7.900393
## 185          7.656460          7.966810          7.996853          8.056081
## 186          7.680195          7.990544          8.020588          8.079816
## 187          7.893946          8.204295          8.234339          8.293566

We plot the real and predicted path of the MARWS_log time series and the corresponding prediction intervals.

Data_df <- y_pred_df
length <- nrow(Data_df)
T <- TrnS_length
mu <- 0
sigma <- round(sd(y_res), digits=3)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Line Plot of the Predicted AU Red Wine Monthly Sales Logarithm Time Series (an optimistic scenario) from ", .(First_Date), " to ", .(Last_Date), sep="")))
subtitle_content <- bquote(paste("path length ", .(length), " sample points,  estimated mean ", mu[W]==.(mu), ",  estimated stand. dev. ", sigma[W]==.(sigma), sep=""))
caption_content <- "Author: Roberto Monte"
x_name <- bquote("")
x_breaks_num <- 30
x_breaks_low <- Data_df$t[1]
x_breaks_up <- Data_df$t[length]
x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((max(x_breaks)-x_breaks_up)>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- paste(Data_df$Month[x_breaks],Data_df$Year[x_breaks])
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth, x_breaks_up+J*x_binwidth)
y_name <- bquote("sales (log-kliters)")
y_breaks_num <- 10
y_max <- max(na.omit(Data_df$y),na.omit(Data_df$y_res_pred_99_int_upp))
y_min <- min(na.omit(Data_df$y),na.omit(Data_df$y_res_pred_99_int_low))
y_binwidth <- round((y_max-y_min)/y_breaks_num, digits=3)
y_breaks_low <- floor(y_min/y_binwidth)*y_binwidth
y_breaks_up <- ceiling(y_max/y_binwidth)*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),digits=3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
line_black   <- bquote("in-sample path")
line_magenta <- bquote("out-of-sample path")
line_brown   <- bquote("predicted path")
line_green   <- bquote("90% pred.int.")
line_blue    <- bquote("95% pred.int.")
line_red     <- bquote("99% pred.int.")
leg_line_labs   <- c(line_black, line_brown, line_magenta, line_green, line_blue, line_red)
leg_line_breaks <- c("line_black", "line_brown", "line_magenta", "line_green", "line_blue", "line_red")
leg_line_cols   <- c("line_black"="black", "line_brown"="brown", "line_magenta"="magenta",
                     "line_green"="green", "line_blue"="blue", "line_red"="red")
fill_g <- bquote("90% pred. band")
fill_b <- bquote("95% pred. band")
fill_r <- bquote("99% pred. band")
leg_fill_labs   <- c( fill_g, fill_r, fill_b)
leg_fill_breaks <- c("fill_g", "fill_b", "fill_r")
leg_fill_cols   <- c("fill_g"="lightgreen", "fill_b"="blue", "fill_r"="orangered")
leg_col_labs    <- leg_line_labs
leg_col_breaks  <- leg_line_breaks
leg_col_cols    <- leg_line_cols
y_pred_lp <- ggplot(Data_df, aes(x=t)) + 
  geom_line(data=subset(Data_df, Data_df$t <= t[T+1]), aes(y=y, color="line_black"),
            linetype="solid", alpha=1, size=1, group=1) +
  geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y, color="line_magenta"),
            linetype="solid", alpha=1, size=1, group=1) +
  geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_pred, colour="line_brown"), 
            linetype="solid", alpha=1, size=1) +
  geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_pred_99_int_low, colour="line_red"), 
            linetype="solid", alpha=1, size=1) +
  geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_pred_99_int_upp, colour="line_red"), 
            linetype="solid", alpha=1, size=1) +
  geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_pred_95_int_low, colour="line_blue"), 
            linetype="solid", alpha=1, size=1) +
  geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_pred_95_int_upp, colour="line_blue"), 
            linetype="solid", alpha=1, size=1) +
  geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_pred_90_int_low, colour="line_green"), 
            linetype="solid", alpha=1, size=1) +
  geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_pred_90_int_upp, colour="line_green"), 
            linetype="solid", alpha=1, size=1) +
  geom_ribbon(data=subset(Data_df, Data_df$t >= t[T+1]), alpha=0.3, colour="orangered", 
              aes(ymin=y_pred_99_int_low, ymax=y_pred_95_int_low, fill="fill_r")) +
  geom_ribbon(data=subset(Data_df, Data_df$t >= t[T+1]), alpha=0.3, colour="orangered",
              aes(ymin=y_pred_95_int_upp, ymax=y_pred_99_int_upp, fill="fill_r")) +
  geom_ribbon(data=subset(Data_df, Data_df$t >= t[T+1]), alpha=0.3, colour="blue",
              aes(ymin=y_pred_95_int_low, ymax=y_pred_90_int_low, fill="fill_b")) +
  geom_ribbon(data=subset(Data_df, Data_df$t >= t[T+1]), alpha=0.3, colour="blue",
              aes(ymin=y_pred_90_int_upp, ymax=y_pred_95_int_upp, fill="fill_b")) +
  geom_ribbon(data=subset(Data_df, Data_df$t >= t[T+1]), alpha=0.3, colour="lightgreen",
              aes(ymin=y_pred_90_int_low, ymax=y_pred_90_int_upp, fill="fill_g")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  guides(linetype="none", shape="none") +
  scale_colour_manual(name="Legend", labels=leg_line_labs, values=leg_line_cols, breaks=leg_line_breaks) +
  scale_fill_manual(name="", labels=leg_fill_labs, values=leg_fill_cols, breaks=leg_fill_breaks) +
  guides(colour=guide_legend(order=1), fill=guide_legend(order=2)) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        axis.text.x = element_text(angle=-45, vjust=1, hjust=-0.3),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
 plot(y_pred_lp)

To complete our analysis, we turn back to the original MARWS time series and build the data frame with the predicted of the future path and the corresponding prediction intervals.

y_exp_pred_df <- data.frame(t=1:length(y_pred_ts), Year = trunc(time(y_pred_ts)), Month=month.abb[cycle(y_pred_ts)],
                        y=MARWS_df$RWS, y_pred=exp(as.vector(y_pred_ts)),
                        y_pred_99_int_low=exp(y_pred_99_int_low), y_pred_95_int_low=exp(y_pred_95_int_low),            
                        y_pred_90_int_low=exp(y_pred_90_int_low), y_pred_90_int_upp=exp(y_pred_90_int_upp), 
                        y_pred_95_int_upp=exp(y_pred_95_int_upp), y_pred_99_int_upp=exp(y_pred_99_int_upp))
head(y_pred_df)
##   t Year Month        y   y_pred y_pred_99_int_low y_pred_95_int_low
## 1 1 1980   Jan 6.139885 6.139885                NA                NA
## 2 2 1980   Feb 6.514713 6.514713                NA                NA
## 3 3 1980   Mar 6.555357 6.555357                NA                NA
## 4 4 1980   Apr 6.787845 6.787845                NA                NA
## 5 5 1980   May 7.037906 7.037906                NA                NA
## 6 6 1980   Jun 6.981935 6.981935                NA                NA
##   y_pred_90_int_low y_pred_90_int_upp y_pred_95_int_upp y_pred_99_int_upp
## 1                NA                NA                NA                NA
## 2                NA                NA                NA                NA
## 3                NA                NA                NA                NA
## 4                NA                NA                NA                NA
## 5                NA                NA                NA                NA
## 6                NA                NA                NA                NA
tail(y_pred_df,(forecast_length+1))
##       t Year Month        y   y_pred y_pred_99_int_low y_pred_95_int_low
## 168 168 1993   Dec 7.837949 7.837949                NA                NA
## 169 169 1994   Jan 6.947937 7.045745          6.801299          6.860527
## 170 170 1994   Feb 7.454720 7.336575          7.092129          7.151356
## 171 171 1994   Mar 7.696667 7.515781          7.271335          7.330563
## 172 172 1994   Apr 7.805882 7.620563          7.376117          7.435345
## 173 173 1994   May 7.698029 7.776317          7.531870          7.591098
## 174 174 1994   Jun 7.886081 7.800115          7.555669          7.614896
## 175 175 1994   Jul 8.207947 8.013926          7.769480          7.828708
## 176 176 1994   Aug 7.887959 8.014267          7.769821          7.829048
## 177 177 1994   Sep 7.878155 7.776428          7.531982          7.591209
## 178 178 1994   Oct 7.707962 7.688937          7.444491          7.503719
## 179 179 1994   Nov 7.857868 7.782219          7.537773          7.597001
## 180 180 1994   Dec 7.895063 7.849390          7.604943          7.664171
## 181 181 1995   Jan 7.077498 7.081340          6.836894          6.896122
## 182 182 1995   Feb 7.466799 7.372097          7.127651          7.186878
## 183 183 1995   Mar 7.807510 7.551233          7.306787          7.366015
## 184 184 1995   Apr 7.870166 7.655947          7.411501          7.470729
## 185 185 1995   May 7.857481 7.811635          7.567189          7.626417
## 186 186 1995   Jun 8.104703 7.835370          7.590924          7.650151
## 187 187 1995   Jul 8.274612 8.049120          7.804674          7.863902
##     y_pred_90_int_low y_pred_90_int_upp y_pred_95_int_upp y_pred_99_int_upp
## 168                NA                NA                NA                NA
## 169          6.890571          7.200920          7.230964          7.290192
## 170          7.181400          7.491749          7.521793          7.581021
## 171          7.360607          7.670956          7.701000          7.760228
## 172          7.465389          7.775738          7.805782          7.865009
## 173          7.621142          7.931491          7.961535          8.020763
## 174          7.644940          7.955289          7.985333          8.044561
## 175          7.858752          8.169101          8.199145          8.258372
## 176          7.859092          8.169441          8.199485          8.258713
## 177          7.621253          7.931602          7.961646          8.020874
## 178          7.533763          7.844112          7.874156          7.933384
## 179          7.627045          7.937394          7.967438          8.026666
## 180          7.694215          8.004564          8.034608          8.093836
## 181          6.926166          7.236515          7.266559          7.325787
## 182          7.216922          7.527271          7.557315          7.616543
## 183          7.396059          7.706408          7.736452          7.795679
## 184          7.500772          7.811122          7.841166          7.900393
## 185          7.656460          7.966810          7.996853          8.056081
## 186          7.680195          7.990544          8.020588          8.079816
## 187          7.893946          8.204295          8.234339          8.293566

In the end, we plot the real and predicted path of the MARWS time series and the corresponding prediction intervals.

 Data_df <- y_exp_pred_df
 length <- nrow(Data_df)
 T <- TrnS_length
 mu <- 0
 sigma <- round(sd(y_res), digits=3)
 First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
 Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
 title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Line Plot of the Predicted AU Red Wine Monthly Sales Time Series (an optimistic scenario) from ", .(First_Date), " to ", .(Last_Date), sep="")))
 subtitle_content <- bquote(paste("path length ", .(length), " sample points,  estimated mean ", mu[W]==.(mu), ",  estimated stand. dev. ", sigma[W]==.(sigma), sep=""))
 caption_content <- "Author: Roberto Monte"
 x_name <- bquote("")
 x_breaks_num <- 30
 x_breaks_low <- Data_df$t[1]
 x_breaks_up <- Data_df$t[length]
 x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
 x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
 if((max(x_breaks)-x_breaks_up)>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
 x_labs <- paste(Data_df$Month[x_breaks],Data_df$Year[x_breaks])
 J <- 0
 x_lims <- c(x_breaks_low-J*x_binwidth, x_breaks_up+J*x_binwidth)
 y_name <- bquote("sales (kliters)")
 y_breaks_num <- 10
 y_max <- max(na.omit(Data_df$y),na.omit(Data_df$y_res_pred_99_int_upp))
 y_min <- min(na.omit(Data_df$y),na.omit(Data_df$y_res_pred_99_int_low))
 y_binwidth <- round((y_max-y_min)/y_breaks_num, digits=3)
 y_breaks_low <- floor(y_min/y_binwidth)*y_binwidth
 y_breaks_up <- ceiling(y_max/y_binwidth)*y_binwidth
 y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),digits=3)
 y_labs <- format(y_breaks, scientific=FALSE)
 K <- 0
 y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
 line_black   <- bquote("in-sample path")
 line_magenta <- bquote("out-of-sample path")
 line_brown   <- bquote("predicted path")
 line_green   <- bquote("90% pred.int.")
 line_blue    <- bquote("95% pred.int.")
 line_red     <- bquote("99% pred.int.")
 leg_line_labs   <- c(line_black, line_brown, line_magenta, line_green, line_blue, line_red)
 leg_line_breaks <- c("line_black", "line_brown", "line_magenta", "line_green", "line_blue", "line_red")
 leg_line_cols   <- c("line_black"="black", "line_brown"="brown", "line_magenta"="magenta",
                      "line_green"="green", "line_blue"="blue", "line_red"="red")
 fill_g <- bquote("90% pred. band")
 fill_b <- bquote("95% pred. band")
 fill_r <- bquote("99% pred. band")
 leg_fill_labs   <- c( fill_g, fill_r, fill_b)
 leg_fill_breaks <- c("fill_g", "fill_b", "fill_r")
 leg_fill_cols   <- c("fill_g"="lightgreen", "fill_b"="blue", "fill_r"="orangered")
 leg_col_labs    <- leg_line_labs
 leg_col_breaks  <- leg_line_breaks
 leg_col_cols    <- leg_line_cols
 y_exp_pred_lp <- ggplot(Data_df, aes(x=t)) + 
   geom_line(data=subset(Data_df, Data_df$t <= t[T+1]), aes(y=y, color="line_black"),
             linetype="solid", alpha=1, size=1, group=1) +
   geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y, color="line_magenta"),
             linetype="solid", alpha=1, size=1, group=1) +
   geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_pred, colour="line_brown"), 
             linetype="solid", alpha=1, size=1) +
   geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_pred_99_int_low, colour="line_red"), 
             linetype="solid", alpha=1, size=1) +
   geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_pred_99_int_upp, colour="line_red"), 
             linetype="solid", alpha=1, size=1) +
   geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_pred_95_int_low, colour="line_blue"), 
             linetype="solid", alpha=1, size=1) +
   geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_pred_95_int_upp, colour="line_blue"), 
             linetype="solid", alpha=1, size=1) +
   geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_pred_90_int_low, colour="line_green"), 
             linetype="solid", alpha=1, size=1) +
   geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_pred_90_int_upp, colour="line_green"), 
             linetype="solid", alpha=1, size=1) +
   geom_ribbon(data=subset(Data_df, Data_df$t >= t[T+1]), alpha=0.3, colour="orangered", 
               aes(ymin=y_pred_99_int_low, ymax=y_pred_95_int_low, fill="fill_r")) +
   geom_ribbon(data=subset(Data_df, Data_df$t >= t[T+1]), alpha=0.3, colour="orangered",
               aes(ymin=y_pred_95_int_upp, ymax=y_pred_99_int_upp, fill="fill_r")) +
   geom_ribbon(data=subset(Data_df, Data_df$t >= t[T+1]), alpha=0.3, colour="blue",
               aes(ymin=y_pred_95_int_low, ymax=y_pred_90_int_low, fill="fill_b")) +
   geom_ribbon(data=subset(Data_df, Data_df$t >= t[T+1]), alpha=0.3, colour="blue",
               aes(ymin=y_pred_90_int_upp, ymax=y_pred_95_int_upp, fill="fill_b")) +
   geom_ribbon(data=subset(Data_df, Data_df$t >= t[T+1]), alpha=0.3, colour="lightgreen",
               aes(ymin=y_pred_90_int_low, ymax=y_pred_90_int_upp, fill="fill_g")) +
   scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_labs, limits=x_lims) +
   scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                      sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
   ggtitle(title_content) +
   labs(subtitle=subtitle_content, caption=caption_content) +
   guides(linetype="none", shape="none") +
   scale_colour_manual(name="Legend", labels=leg_line_labs, values=leg_line_cols, breaks=leg_line_breaks) +
   scale_fill_manual(name="", labels=leg_fill_labs, values=leg_fill_cols, breaks=leg_fill_breaks) +
   guides(colour=guide_legend(order=1), fill=guide_legend(order=2)) +
   theme(plot.title=element_text(hjust = 0.5), 
         plot.subtitle=element_text(hjust =  0.5),
         plot.caption = element_text(hjust = 1.0),
         axis.text.x = element_text(angle=-45, vjust=1, hjust=-0.3),
         legend.key.width = unit(0.8,"cm"), legend.position="bottom")
 plot(y_exp_pred_lp)

To get a closer view, we modify the data frame by deleting part of the past rows and apply the same chunk of code.

m <- 160
  T <- length(y_res)-m
  Data_df <- y_exp_pred_df[-c(1:m),]
  row.names(Data_df) <- c(1:nrow(Data_df))
  Data_df$t <- c(1:nrow(Data_df))
  length <- nrow(Data_df)
  mu <- round(mean(y_res), digits=3)
  sigma <- round(sd(y_res), digits=3)
  First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
  Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
  title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Scatter Plot of the Predicted AU Red Wine Monthly Sales Time Series (an optimistic scenario) from ", .(First_Date), " to ", .(Last_Date), sep="")))
  subtitle_content <- bquote(paste("path length ", .(length), " sample points,  estimated mean ", mu[W]==.(mu), ",  estimated stand. dev. ", sigma[W]==.(sigma), sep=""))
  caption_content <- "Author: Roberto Monte"
  x_name <- bquote("")
  x_breaks_num <- 10
  x_breaks_low <- Data_df$t[1]
  x_breaks_up <- Data_df$t[length]
  x_binwidth <- floor((x_breaks_up-x_breaks_low)/x_breaks_num)
  x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
  if((max(x_breaks)-x_breaks_up)>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
  x_labs <- paste(Data_df$Month[x_breaks],Data_df$Year[x_breaks])
  J <- 0
  x_lims <- c(x_breaks_low-J*x_binwidth, x_breaks_up+J*x_binwidth)
  y_name <- bquote("sales (kliters)")
  y_breaks_num <- 10
  y_max <- max(na.omit(Data_df$y),na.omit(Data_df$y_res_pred_99_int_upp))
  y_min <- min(na.omit(Data_df$y),na.omit(Data_df$y_res_pred_99_int_low))
  y_binwidth <- round((y_max-y_min)/y_breaks_num, digits=3)
  y_breaks_low <- floor(y_min/y_binwidth)*y_binwidth
  y_breaks_up <- ceiling(y_max/y_binwidth)*y_binwidth
  y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),digits=3)
  y_labs <- format(y_breaks, scientific=FALSE)
  K <- 0
  y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
  line_black   <- bquote("in-sample path")
  line_magenta <- bquote("out-of-sample path")
  line_brown   <- bquote("predicted path")
  line_green   <- bquote("90% pred.int.")
  line_blue    <- bquote("95% pred.int.")
  line_red     <- bquote("99% pred.int.")
  leg_line_labs   <- c(line_black, line_brown, line_magenta, line_green, line_blue, line_red)
  leg_line_breaks <- c("line_black", "line_brown", "line_magenta", "line_green", "line_blue", "line_red")
  leg_line_cols   <- c("line_black"="black", "line_brown"="brown", "line_magenta"="magenta",
                       "line_green"="green", "line_blue"="blue", "line_red"="red")
  fill_g <- bquote("90% pred. band")
  fill_b <- bquote("95% pred. band")
  fill_r <- bquote("99% pred. band")
  leg_fill_labs   <- c( fill_g, fill_r, fill_b)
  leg_fill_breaks <- c("fill_g", "fill_b", "fill_r")
  leg_fill_cols   <- c("fill_g"="lightgreen", "fill_b"="blue", "fill_r"="orangered")
  leg_col_labs    <- leg_line_labs
  leg_col_breaks  <- leg_line_breaks
  leg_col_cols    <- leg_line_cols
  y_exp_pred_lp <- ggplot(Data_df, aes(x=t)) + 
    geom_line(data=subset(Data_df, Data_df$t <= t[T+1]), aes(y=y, color="line_black"),
              linetype="solid", alpha=1, size=1, group=1) +
    geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y, color="line_magenta"),
              linetype="solid", alpha=1, size=1, group=1) +
    geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_pred, colour="line_brown"), 
              linetype="solid", alpha=1, size=1) +
    geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_pred_99_int_low, colour="line_red"), 
              linetype="solid", alpha=1, size=1) +
    geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_pred_99_int_upp, colour="line_red"), 
              linetype="solid", alpha=1, size=1) +
    geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_pred_95_int_low, colour="line_blue"), 
              linetype="solid", alpha=1, size=1) +
    geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_pred_95_int_upp, colour="line_blue"), 
              linetype="solid", alpha=1, size=1) +
    geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_pred_90_int_low, colour="line_green"), 
              linetype="solid", alpha=1, size=1) +
    geom_line(data=subset(Data_df, Data_df$t >= t[T+1]), aes(y=y_pred_90_int_upp, colour="line_green"), 
              linetype="solid", alpha=1, size=1) +
    geom_ribbon(data=subset(Data_df, Data_df$t >= t[T+1]), alpha=0.3, colour="orangered", 
                aes(ymin=y_pred_99_int_low, ymax=y_pred_95_int_low, fill="fill_r")) +
    geom_ribbon(data=subset(Data_df, Data_df$t >= t[T+1]), alpha=0.3, colour="orangered",
                aes(ymin=y_pred_95_int_upp, ymax=y_pred_99_int_upp, fill="fill_r")) +
    geom_ribbon(data=subset(Data_df, Data_df$t >= t[T+1]), alpha=0.3, colour="blue",
                aes(ymin=y_pred_95_int_low, ymax=y_pred_90_int_low, fill="fill_b")) +
    geom_ribbon(data=subset(Data_df, Data_df$t >= t[T+1]), alpha=0.3, colour="blue",
                aes(ymin=y_pred_90_int_upp, ymax=y_pred_95_int_upp, fill="fill_b")) +
    geom_ribbon(data=subset(Data_df, Data_df$t >= t[T+1]), alpha=0.3, colour="lightgreen",
                aes(ymin=y_pred_90_int_low, ymax=y_pred_90_int_upp, fill="fill_g")) +
    scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_labs, limits=x_lims) +
    scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                       sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
    ggtitle(title_content) +
    labs(subtitle=subtitle_content, caption=caption_content) +
    guides(linetype="none", shape="none") +
    scale_colour_manual(name="Legend", labels=leg_line_labs, values=leg_line_cols, breaks=leg_line_breaks) +
    scale_fill_manual(name="", labels=leg_fill_labs, values=leg_fill_cols, breaks=leg_fill_breaks) +
    guides(colour=guide_legend(order=1), fill=guide_legend(order=2)) +
    theme(plot.title=element_text(hjust = 0.5), 
          plot.subtitle=element_text(hjust =  0.5),
          plot.caption = element_text(hjust = 1.0),
          axis.text.x = element_text(angle=-45, vjust=1, hjust=-0.3),
          legend.key.width = unit(0.8,"cm"), legend.position="bottom")
  plot(y_exp_pred_lp)

4 Stochastic Processes

As already mentioned, stochastic processes are mathematical models of real world phenomena evolving in time under the relevant influence of some random perturbation, the so called stochastic phenomena. Typical examples of natural stochastic phenomena are meteorological, hydrological, climatic, seismic, epidemiological phenomena, sunspots, growth and competition of animal and plant populations, and so on. Typical examples of social, economic, and financial stochastic phenomena are demographic phenomena, business cycles, gross domestic products, employment rates, commodity prices, bond prices, stock prices, derivative prices, interest rates, exchange rates, and so on. All mentioned phenomena share the common feature that they are clearly affected by the frequent occurrence of erratic events. Therefore, the study of the time evolution of their descriptive variables advocates the application of stochastic models.

Roughly speaking, a stochastic process with states in some Euclidean multi-dimensional real space is a time-indexed family of random variables defined on the same probability space and taking values in such an Euclidean space. We make this notion more precise below.

Let \(\left(\Omega,\mathcal{E},\mathbf{P}\right)\equiv\Omega\) be a probability space, let \(\mathbb{T}\) be a non-empty subset of the real line \(\mathbb{R}\), and let \(\mathbb{R}^{N}\) be the Euclidean \(N\)-dimensional real space, for some \(N\in\mathbb{N}\), equipped with the Borel \(\sigma\)-algebra \(\mathcal{B}\left(\mathbb{R}^{N}\right)\) (family of measurable subsets of \(\mathbb{R}^{N}\)) and the Borel-Lebesgue measure \(\mu_{L}^{N}:\mathcal{B}\left(\mathbb{R}^{N}\right)\rightarrow\mathbb{\bar{R}}_{+}\) (measure for the subsets of the family \(\mathcal{B}\left(\mathbb{R}^{N}\right)\)).

Definition 4.1 (Stochastic process) We call a stochastic process on \(\Omega\) with time set \(\mathbb{T}\) and state space \(\mathbb{R}^{N}\) a family \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{X}\) of \(N\)-variate real random variables on \(\Omega\). A stochastic process with state space \(\mathbb{R}\) will be simply called a real stochastic process.

When the time set \(\mathbb{T}\) is a subset of \(\mathbb{Z}\) [resp. an interval of \(\mathbb{R}\)], we speak of discrete-time [resp. continuous-time] stochastic process. In case of discrete-time stochastic process, we typically assume \(\mathbb{T}\equiv\left\{1,\dots,T\right\}\), for some \(T\in\mathbb{N}\), or \(\mathbb{T}\equiv\mathbb{N}\) or even \(\mathbb{T}\equiv\mathbb{Z}\). We also assume \(\mathbb{T}\equiv\left\{0,1,\dots,T\right\}\) or \(\mathbb{T}\equiv\mathbb{N}_{0}\), when we want to stress that something special occurs at the initial time \(t=0\). In case of continuous-time stochastic process, we typically assume \(\mathbb{T}\equiv\lbrack 0,T\rbrack\), for some \(T>0\), or \(\mathbb{T}\equiv\left[0,+\infty\right)\).

Definition 4.2 (Sample path) For any \(\omega\in\Omega\), we call the \(\omega\)-sample path of the stochastic process \(\mathbf{X}\) the family \(\left(x_{t}\right)_{t\in\mathbb{T}}\) of points in \(\mathbb{R}^{N}\) given by \[\begin{equation} x_{t}\overset{\text{def}}{=}X_{t}\left(\omega\right), \quad\forall t\in\mathbb{T}. \end{equation}\] The sample paths of stochastic process \(\mathbf{X}\) are also called the trajectories or realizations of \(\mathbf{X}\).

Example 4.1 (Dirac process) Fixed any \(\mathbb{T}\subseteq\mathbb{R}\), consider a Dirac random variable \(X_{t}\) concentrated at some \(x_{t}\in\mathbb{R}^{N}\) given by \[\begin{equation} X_{t}\overset{\text{def}}{=}x_{t},\quad\mathbf{P}\left(X_{t}=x_{t}\right)=1, \quad\forall t\in\mathbb{T}, \end{equation}\] where \(x_{t}\) may vary in \(\mathbb{R}^{N}\) on varying of \(t\in\mathbb{T}\). Then the family \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{X}\) is a stochastic process with time set \(\mathbb{T}\) and state space \(\mathbb{R}^{N}\), which is referred to as a Dirac process.

Remark (Dirac process sample paths). Dirac processes are just deterministic processes in a stochastic setting. Every Dirac process has (almost surely) a unique sample path.

We build three simple Dirac processes

# Dirac Processes
# We set the time set T.
T <- seq(from=0, to=2, length.out=201)
# It is often natural to define a stochastic process by a "for loop",
# on varying of the time index in the time set.
# Nevertheless, in some cases, the vector operations defined in R allow to skip (as it is recommended)
# such a for loop. Consider the following examples.
# We define the first process
X_0 <- 0
X <- rep(NA,length(T)-1)
for(t in T[-1]){X[which(T==t)-1] <- X_0}
X <- c(X_0,X)
show(X)
##   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# This definition can be simplified as
X_0 <- 0
X <- X_0 + 0*T
show(X)
##   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# or even simpler
X <- rep(0,length(T))
show(X)
##   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# We define the second process
a <- 1
Y_0 <- 0 
Y <- rep(NA,length(T)-1)
for(t in T[-1]){Y[which(T==t)-1] <- Y_0 + a*t}
Y <- c(Y_0,Y)
show(Y)
##   [1] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14
##  [16] 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
##  [31] 0.30 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44
##  [46] 0.45 0.46 0.47 0.48 0.49 0.50 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59
##  [61] 0.60 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.70 0.71 0.72 0.73 0.74
##  [76] 0.75 0.76 0.77 0.78 0.79 0.80 0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89
##  [91] 0.90 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00 1.01 1.02 1.03 1.04
## [106] 1.05 1.06 1.07 1.08 1.09 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17 1.18 1.19
## [121] 1.20 1.21 1.22 1.23 1.24 1.25 1.26 1.27 1.28 1.29 1.30 1.31 1.32 1.33 1.34
## [136] 1.35 1.36 1.37 1.38 1.39 1.40 1.41 1.42 1.43 1.44 1.45 1.46 1.47 1.48 1.49
## [151] 1.50 1.51 1.52 1.53 1.54 1.55 1.56 1.57 1.58 1.59 1.60 1.61 1.62 1.63 1.64
## [166] 1.65 1.66 1.67 1.68 1.69 1.70 1.71 1.72 1.73 1.74 1.75 1.76 1.77 1.78 1.79
## [181] 1.80 1.81 1.82 1.83 1.84 1.85 1.86 1.87 1.88 1.89 1.90 1.91 1.92 1.93 1.94
## [196] 1.95 1.96 1.97 1.98 1.99 2.00
# This definition can be simplified as
a <- 1
Y_0 <- 0
Y <- Y_0 + a*T
show(Y)
##   [1] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14
##  [16] 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
##  [31] 0.30 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44
##  [46] 0.45 0.46 0.47 0.48 0.49 0.50 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59
##  [61] 0.60 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.70 0.71 0.72 0.73 0.74
##  [76] 0.75 0.76 0.77 0.78 0.79 0.80 0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89
##  [91] 0.90 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00 1.01 1.02 1.03 1.04
## [106] 1.05 1.06 1.07 1.08 1.09 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17 1.18 1.19
## [121] 1.20 1.21 1.22 1.23 1.24 1.25 1.26 1.27 1.28 1.29 1.30 1.31 1.32 1.33 1.34
## [136] 1.35 1.36 1.37 1.38 1.39 1.40 1.41 1.42 1.43 1.44 1.45 1.46 1.47 1.48 1.49
## [151] 1.50 1.51 1.52 1.53 1.54 1.55 1.56 1.57 1.58 1.59 1.60 1.61 1.62 1.63 1.64
## [166] 1.65 1.66 1.67 1.68 1.69 1.70 1.71 1.72 1.73 1.74 1.75 1.76 1.77 1.78 1.79
## [181] 1.80 1.81 1.82 1.83 1.84 1.85 1.86 1.87 1.88 1.89 1.90 1.91 1.92 1.93 1.94
## [196] 1.95 1.96 1.97 1.98 1.99 2.00
# We define the third process
a <- 0
b <- 1
Z_0 <- 0
Z <- rep(NA,length(T)-1)
for(t in T[-1]){Z[which(T==t)-1] <- Z_0 + a*t + b*(t^2)}
Z <- c(Z_0,Z)
show(Z)
##   [1] 0.0000 0.0001 0.0004 0.0009 0.0016 0.0025 0.0036 0.0049 0.0064 0.0081
##  [11] 0.0100 0.0121 0.0144 0.0169 0.0196 0.0225 0.0256 0.0289 0.0324 0.0361
##  [21] 0.0400 0.0441 0.0484 0.0529 0.0576 0.0625 0.0676 0.0729 0.0784 0.0841
##  [31] 0.0900 0.0961 0.1024 0.1089 0.1156 0.1225 0.1296 0.1369 0.1444 0.1521
##  [41] 0.1600 0.1681 0.1764 0.1849 0.1936 0.2025 0.2116 0.2209 0.2304 0.2401
##  [51] 0.2500 0.2601 0.2704 0.2809 0.2916 0.3025 0.3136 0.3249 0.3364 0.3481
##  [61] 0.3600 0.3721 0.3844 0.3969 0.4096 0.4225 0.4356 0.4489 0.4624 0.4761
##  [71] 0.4900 0.5041 0.5184 0.5329 0.5476 0.5625 0.5776 0.5929 0.6084 0.6241
##  [81] 0.6400 0.6561 0.6724 0.6889 0.7056 0.7225 0.7396 0.7569 0.7744 0.7921
##  [91] 0.8100 0.8281 0.8464 0.8649 0.8836 0.9025 0.9216 0.9409 0.9604 0.9801
## [101] 1.0000 1.0201 1.0404 1.0609 1.0816 1.1025 1.1236 1.1449 1.1664 1.1881
## [111] 1.2100 1.2321 1.2544 1.2769 1.2996 1.3225 1.3456 1.3689 1.3924 1.4161
## [121] 1.4400 1.4641 1.4884 1.5129 1.5376 1.5625 1.5876 1.6129 1.6384 1.6641
## [131] 1.6900 1.7161 1.7424 1.7689 1.7956 1.8225 1.8496 1.8769 1.9044 1.9321
## [141] 1.9600 1.9881 2.0164 2.0449 2.0736 2.1025 2.1316 2.1609 2.1904 2.2201
## [151] 2.2500 2.2801 2.3104 2.3409 2.3716 2.4025 2.4336 2.4649 2.4964 2.5281
## [161] 2.5600 2.5921 2.6244 2.6569 2.6896 2.7225 2.7556 2.7889 2.8224 2.8561
## [171] 2.8900 2.9241 2.9584 2.9929 3.0276 3.0625 3.0976 3.1329 3.1684 3.2041
## [181] 3.2400 3.2761 3.3124 3.3489 3.3856 3.4225 3.4596 3.4969 3.5344 3.5721
## [191] 3.6100 3.6481 3.6864 3.7249 3.7636 3.8025 3.8416 3.8809 3.9204 3.9601
## [201] 4.0000
# This definition can be simplified as
a <- 0
b <- 1
Z_0 <- 0
Z <- Z_0 + a*T + b*(T^2)
show(Z)
##   [1] 0.0000 0.0001 0.0004 0.0009 0.0016 0.0025 0.0036 0.0049 0.0064 0.0081
##  [11] 0.0100 0.0121 0.0144 0.0169 0.0196 0.0225 0.0256 0.0289 0.0324 0.0361
##  [21] 0.0400 0.0441 0.0484 0.0529 0.0576 0.0625 0.0676 0.0729 0.0784 0.0841
##  [31] 0.0900 0.0961 0.1024 0.1089 0.1156 0.1225 0.1296 0.1369 0.1444 0.1521
##  [41] 0.1600 0.1681 0.1764 0.1849 0.1936 0.2025 0.2116 0.2209 0.2304 0.2401
##  [51] 0.2500 0.2601 0.2704 0.2809 0.2916 0.3025 0.3136 0.3249 0.3364 0.3481
##  [61] 0.3600 0.3721 0.3844 0.3969 0.4096 0.4225 0.4356 0.4489 0.4624 0.4761
##  [71] 0.4900 0.5041 0.5184 0.5329 0.5476 0.5625 0.5776 0.5929 0.6084 0.6241
##  [81] 0.6400 0.6561 0.6724 0.6889 0.7056 0.7225 0.7396 0.7569 0.7744 0.7921
##  [91] 0.8100 0.8281 0.8464 0.8649 0.8836 0.9025 0.9216 0.9409 0.9604 0.9801
## [101] 1.0000 1.0201 1.0404 1.0609 1.0816 1.1025 1.1236 1.1449 1.1664 1.1881
## [111] 1.2100 1.2321 1.2544 1.2769 1.2996 1.3225 1.3456 1.3689 1.3924 1.4161
## [121] 1.4400 1.4641 1.4884 1.5129 1.5376 1.5625 1.5876 1.6129 1.6384 1.6641
## [131] 1.6900 1.7161 1.7424 1.7689 1.7956 1.8225 1.8496 1.8769 1.9044 1.9321
## [141] 1.9600 1.9881 2.0164 2.0449 2.0736 2.1025 2.1316 2.1609 2.1904 2.2201
## [151] 2.2500 2.2801 2.3104 2.3409 2.3716 2.4025 2.4336 2.4649 2.4964 2.5281
## [161] 2.5600 2.5921 2.6244 2.6569 2.6896 2.7225 2.7556 2.7889 2.8224 2.8561
## [171] 2.8900 2.9241 2.9584 2.9929 3.0276 3.0625 3.0976 3.1329 3.1684 3.2041
## [181] 3.2400 3.2761 3.3124 3.3489 3.3856 3.4225 3.4596 3.4969 3.5344 3.5721
## [191] 3.6100 3.6481 3.6864 3.7249 3.7636 3.8025 3.8416 3.8809 3.9204 3.9601
## [201] 4.0000
# We build a data frame where to store the time set T and the processes X,Y,Z
Dirac_df <- data.frame(T,X,Y,Z)
head(Dirac_df)
##      T X    Y      Z
## 1 0.00 0 0.00 0.0000
## 2 0.01 0 0.01 0.0001
## 3 0.02 0 0.02 0.0004
## 4 0.03 0 0.03 0.0009
## 5 0.04 0 0.04 0.0016
## 6 0.05 0 0.05 0.0025
# We add an index column
Dirac_df <- add_column(Dirac_df,  n=1:nrow(Dirac_df), .before=T)
head(Dirac_df)
##   n    T X    Y      Z
## 1 1 0.00 0 0.00 0.0000
## 2 2 0.01 0 0.01 0.0001
## 3 3 0.02 0 0.02 0.0004
## 4 4 0.03 0 0.03 0.0009
## 5 5 0.04 0 0.04 0.0016
## 6 6 0.05 0 0.05 0.0025

We draw the plot of the paths of the three Dirac processes.

Data_df <- Dirac_df
length <- nrow(Data_df)
First_Date <- as.character(Data_df$T[1])
Last_Date <- as.character(Data_df$T[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Scatter Plot of the Almost Sure Sample Paths of Three Dirac Processes for t = ", .(First_Date), " to t = ", .(Last_Date))))
subtitle_content <- bquote("path length" ~ .(length(T)) ~ "sample points" ~~~~~ 
                             "Dirac process concentrated at" ~ 0 ~":" ~~ X[t]==0 ~";"~~~~ 
                             "Dirac process concentrated at" ~ t ~":" ~~ X[t]==t ~";"~~~~   
                             "Dirac process concentrated at" ~ t^2 ~":" ~~ X[t]==t^2~".")
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
y_name <- bquote(~ X[t] ~ "values")
x_breaks_num <- 20
x_breaks_low <- Data_df$n[1]
x_breaks_up <- Data_df$n[length]
x_binwidth <- ceiling((x_breaks_up-x_breaks_low)/x_breaks_num)
x_breaks <- c(x_breaks_low,seq(from=x_binwidth+1, to=x_breaks_up, by=x_binwidth))
if((max(x_breaks)-x_breaks_up)>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- format(Data_df$T[x_breaks], scientific=FALSE)
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth, x_breaks_up+J*x_binwidth)
y_breaks_num <- 20
y_binwidth_low <- min(Data_df$X,Data_df$Y,Data_df$Z)
y_binwidth_up <- max(Data_df$X,Data_df$Y,Data_df$Z)
y_binwidth <- round((y_binwidth_up-y_binwidth_low)/y_breaks_num, digits=2)
y_breaks_low <- floor(y_binwidth_low/y_binwidth)*y_binwidth
y_breaks_up <- ceiling(y_binwidth_up/y_binwidth)*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),2)
y_labs <- format(y_breaks, scientific=FALSE)
y_lims <- c((y_breaks_low-1.0*y_binwidth), (y_breaks_up+1.0*y_binwidth))
x_col <- bquote("Dirac proc. conc. at" ~  0)
y_col <- bquote("Dirac proc. conc. at" ~  t)
z_col <- bquote("Dirac proc. conc. at" ~  t^2)
leg_labs <- c(x_col, y_col, z_col)
leg_cols <- c("x_col"="red", "y_col"="green", "z_col"="blue")
leg_ord <- c("x_col", "y_col", "z_col")
Data_df_sp <- ggplot(Data_df, aes(x=n)) + 
  geom_point(alpha=1, size=1, aes(y=X, color="x_col")) +
  geom_point(alpha=1, size=1, aes(y=Y, color="y_col")) +
  geom_point(alpha=1, size=1, aes(y=Z, color="z_col")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~.,breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_ord) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        axis.text.y.left = element_blank(),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(Data_df_sp)

Assume we consider the summary statistic of the Dirac process \(\mathbf{X}\) concentrated at \(t\).

summary(Dirac_df$Y)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     0.5     1.0     1.0     1.5     2.0

What this summary makes you think?

Assume we plot the density histogram of the Dirac process \(\mathbf{X}\) concentrated at \(t^2\).

Data_df <- Dirac_df
First_Date <- as.character(Data_df$T[1])
Last_Date <- as.character(Data_df$T[length(Data_df$T)])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", paste("Density Histogram of the Almost Sure Path of a Dirac Processes for t = ", .(First_Date), " to t = ", .(Last_Date))))
subtitle_content <- bquote("path length" ~ .(length(T)) ~ "sample points;" ~~~~~ 
                             "Dirac process concentrated at" ~ t^2 ~":" ~~ X[t]==t^2~".")
caption_content <- "Author: Roberto Monte"
x_breaks_num <- 10
x_breaks_low <- Data_df$Z[1]
x_breaks_up <- Data_df$Z[length]
x_binwidth <- round(((x_breaks_up-x_breaks_low)/x_breaks_num),2)
x_breaks <- seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth)
if((max(x_breaks)-x_breaks_up)>x_binwidth/2){x_breaks <- c(x_breaks,x_breaks_up)}
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0
x_lims <- c(x_breaks_low-J*x_binwidth, x_breaks_up+J*x_binwidth)
Data_df_dh <- ggplot(Data_df, aes(x=Z)) +
  geom_histogram(binwidth = x_binwidth, aes(y=..density..), #  density histogram
                 color="black", fill="blue", alpha=0.5)+
  #  scale_x_continuous(name="Sample Data", breaks=waiver(), labels=waiver()) +
  scale_x_continuous(name="Sample Data", breaks=x_breaks, labels=x_labs
                     # , limits=x_lims
  ) +
  scale_y_continuous(name="Data Density", breaks=waiver(), labels=NULL,
                     sec.axis=sec_axis(~., breaks=waiver(), labels=waiver())) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0))
plot(Data_df_dh)

What this histogram makes you think?

As it will be clearer in the following, due to the lack of stationarity in the mean, both the summary statistic of the Dirac process concentrated at \(t\) and the density histogram of the Dirac process concentrated at \(t^2\) are meaningless.

Example 4.2 (Bernoulli process) Fixed any \(\mathbb{T}\subseteq\mathbb{R}\), consider a Bernoulli real random variable \(X_{t}\) with success probability parameter \(p\), given by \[\begin{equation} X_{t}\overset{\text{def}}{=} \left\{ \begin{array} [c]{ll} 1, & \mathbf{P}\left(X_{t}=1\right)=p,\\ 0, & \mathbf{P}\left(X_{t}=0\right)=1-p, \end{array} \right. \quad\forall t\in\mathbb{T}. \tag{4.1} \end{equation}\] Then the family \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{X}\) is a stochastic process with time set \(\mathbb{T}\) and state space \(\mathbb{R}\) referred to as Bernoulli process with success probability parameter \(p\). When \(p\equiv 1/2\), the Bernoulli process is said to be standard.

In the previous chapter, we have already built three different paths of the standard Bernoulli process with time set \(\mathbb{T}\equiv\left\{1,\dots,150\right\}\) referred to as standard Bernoulli time series.

We plot the density histogram of the red path of the standard Bernoulli process.

length <- 150                                        # Setting the length of the time series
set.seed(12345)                                      # Setting the random seed "12345" for reproducibility.
Ber_r <- rbinom(n=length, size=1, prob=0.5)          # Simulating the flips of a fair coin, by sampling 
show(Ber_r)                                          # from the standard Bernoulli distribution.
##   [1] 1 1 1 1 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 0 1 1 0 0 1 0 0 1 0 0 1
##  [38] 1 1 0 1 0 1 1 0 0 0 0 0 1 1 1 0 0 1 0 1 0 0 0 1 0 1 1 1 0 1 0 1 1 1 1 0 0
##  [75] 0 1 1 1 1 0 1 1 0 0 0 0 1 1 1 0 1 1 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 0 0 1
## [112] 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1 1 0 0 1 1 1 0 1 0 1 1 0
## [149] 1 0
Ber_r_df <- data.frame(T=1:length(Ber_r), X=Ber_r)   # Building a data frame with the Bernoulli path
head(Ber_r_df)
##   T X
## 1 1 1
## 2 2 1
## 3 3 1
## 4 4 1
## 5 5 0
## 6 6 0
Data_df <- Ber_r_df
length <- nrow(Ber_r_df)
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Density Histogram of the Red Path of the Standard Bernoulli Process"))
subtitle_content <- bquote(paste("Data set size = ", .(length),~~"sample points;",~~~~"number of trials parameter ",~ n == 10,~~~~"success parameter ",~ p == 0.5,~"."))
caption_content <- "Author: Roberto Monte"
x_breaks_num <- 2
x_max <- max(Data_df$X)
x_min <- min(Data_df$X)
x_binwidth <- round((x_max-x_min)/x_breaks_num, digits=1)
x_breaks_low <- floor(x_min/x_binwidth)*x_binwidth
x_breaks_up <- ceiling(x_max/x_binwidth)*x_binwidth
x_breaks <- round(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth), digits=1)
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0.5
x_lims <- c((x_breaks_low-J*x_binwidth), (x_breaks_up+J*x_binwidth))
Data_df_dh <- ggplot(Data_df, aes(x=X)) +
  geom_histogram(binwidth = x_binwidth, aes(y=..density..), # bins=2  # density histogram
                 color="black", fill="red", alpha=0.5)+
  scale_x_continuous(name="Sample Data", breaks=x_breaks, labels=x_labs, limits=x_lims) +
  scale_y_continuous(name="Data Density", breaks=waiver(), labels=NULL,
                     sec.axis=sec_axis(~., breaks=waiver(), labels=waiver())) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0))
plot(Data_df_dh)

What this histogram makes you think?

Example 4.3 (Bernoulli counting process) Fixed any \(\mathbb{T}\subseteq\mathbb{R}\), consider a Bernoulli process \(X_{t}\equiv\mathbf{X}\) with success probability parameter \(p\), given by Equation (4.1) and consider the real random variable \(Y_{t}\) given by \[\begin{equation} Y_{t}\overset{\text{def}}{=}\sum_{s=1}^{t}X_{s}, \quad\forall t\in\left\{1,\dots,T\right\}. \tag{4.2} \end{equation}\] Then the family \(\left(Y_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{Y}\) is a stochastic process with time set \(\mathbb{T}\) and state space \(\mathbb{R}\). Assume further that the random variables in \(\mathbf{X}\) are independent. Then \(\mathbf{Y}\) is referred to as Bernoulli counting process with success probability parameter \(p\). When \(p\equiv 1/2\), the Bernoulli counting process is said to be standard.

In the previous chapter we have already built three different paths of the standard Bernoulli counting process with time set \(\mathbb{T}\equiv\left\{1,\dots,150\right\}\).

Assume we plot the density histogram of the green path of the standard Bernoulli counting process.

length <- 150                                        # Setting the length of the time series.
set.seed(23451)                                      # Setting the random seed "23451" for reproducibility.
Ber_g <- rbinom(n=length, size=1, prob=0.5)          # Simulating the flips of a fair coin, by sampling 
                                                     # from the standard Bernoulli distribution.
Ber_cp_g <- cumsum(Ber_g)                            # Counting the heads in the simulated flips of a fair coin.
show(Ber_cp_g)
##   [1]  1  1  2  3  4  4  5  5  5  5  6  6  7  8  9  9  9 10 11 12 12 12 12 12 12
##  [26] 12 12 13 14 14 14 15 16 16 16 17 17 17 17 17 18 19 19 19 19 20 20 20 21 22
##  [51] 23 23 23 24 25 26 26 26 27 28 28 29 30 30 31 31 31 32 33 33 33 34 35 35 35
##  [76] 35 36 36 37 38 39 40 41 41 42 42 42 43 43 43 43 44 44 45 45 46 46 46 47 47
## [101] 47 47 47 47 47 48 49 50 50 50 51 52 52 53 53 53 54 54 55 56 56 57 57 57 58
## [126] 58 59 59 59 60 61 61 61 62 63 64 64 65 66 66 66 67 67 67 67 68 69 70 70 70
Ber_cp_g_df <- data.frame(T=1:length(Ber_cp_g), X=Ber_cp_g)   # Building a data frame with the Bernoulli counting  path
head(Ber_cp_g_df)
##   T X
## 1 1 1
## 2 2 1
## 3 3 2
## 4 4 3
## 5 5 4
## 6 6 4
Data_df <- Ber_cp_g_df
length <- nrow(Data_df)
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Density Histogram of the Green Path of the Standard Bernoulli Counting Process"))
subtitle_content <- bquote(paste("Data set size = ", .(length),~~"sample points;",~~~~"number of trials parameter ",~ n == 10,~~~~"success parameter ",~ p == 0.5,~"."))
caption_content <- "Author: Roberto Monte"
x_breaks_num <- 10
x_max <- max(Data_df$X)
x_min <- min(Data_df$X)
x_binwidth <- round((x_max-x_min)/x_breaks_num, digits=1)
x_breaks_low <- floor(x_min/x_binwidth)*x_binwidth
x_breaks_up <- ceiling(x_max/x_binwidth)*x_binwidth
x_breaks <- round(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth), digits=0)
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0.5
x_lims <- c((x_breaks_low-J*x_binwidth), (x_breaks_up+J*x_binwidth))
Data_df_dh <- ggplot(Data_df, aes(x=X)) +
  geom_histogram(binwidth = x_binwidth, aes(y=..density..), # bins=2  # density histogram
                 color="black", fill="red", alpha=0.5)+
  scale_x_continuous(name="Sample Data", breaks=x_breaks, labels=x_labs, limits=x_lims) +
  scale_y_continuous(name="Data Density", breaks=waiver(), labels=NULL,
                     sec.axis=sec_axis(~., breaks=waiver(), labels=waiver())) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0))
plot(Data_df_dh)

Is the standard Bernoulli counting process stationary in the mean?

Example 4.4 (Rademacher Process) Fixed any \(\mathbb{T}\subseteq\mathbb{R}\), consider a Rademacher real random variable \(X_{t}\) given by \[\begin{equation} X_{t}\overset{\text{def}}{=} \left\{ \begin{array} [c]{ll} -1, & \mathbf{P}\left(X_{t}=-1\right)=\frac{1}{2},\\ 1, & \mathbf{P}\left(X_{t}= 1\right)=\frac{1}{2}, \end{array} \right. \quad\forall t\in\mathbb{T}. \tag{4.3} \end{equation}\] Then the family \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{X}\) is a stochastic process with time set \(\mathbb{T}\) and state space \(\mathbb{R}\). Assume further that the random variables in \(\mathbf{X}\) are independent. Then \(\mathbf{X}\) is referred to as Rademacher white noise.

In the previous chapter, we have already built three different paths of the Rademacher white noise with time set \(\mathbb{T}\equiv\left\{1,\dots,150\right\}\) referred to as Rademacher time series.

We plot the density histogram of the green path of the Rademacher white noise.

length <- 150                                      # Setting the length of the time series
set.seed(23451)                                    # Setting the random seed "23451" for reproducibility.
Rad_g <- 2*rbinom(n=length, size=1, prob=0.5)-1    # Simulating the flips of a Rademacher fair coin, by sampling 
show(Rad_g)                                        # from the standard Bernoulli distribution and applying the rule R=2*B-1.
##   [1]  1 -1  1  1  1 -1  1 -1 -1 -1  1 -1  1  1  1 -1 -1  1  1  1 -1 -1 -1 -1 -1
##  [26] -1 -1  1  1 -1 -1  1  1 -1 -1  1 -1 -1 -1 -1  1  1 -1 -1 -1  1 -1 -1  1  1
##  [51]  1 -1 -1  1  1  1 -1 -1  1  1 -1  1  1 -1  1 -1 -1  1  1 -1 -1  1  1 -1 -1
##  [76] -1  1 -1  1  1  1  1  1 -1  1 -1 -1  1 -1 -1 -1  1 -1  1 -1  1 -1 -1  1 -1
## [101] -1 -1 -1 -1 -1  1  1  1 -1 -1  1  1 -1  1 -1 -1  1 -1  1  1 -1  1 -1 -1  1
## [126] -1  1 -1 -1  1  1 -1 -1  1  1  1 -1  1  1 -1 -1  1 -1 -1 -1  1  1  1 -1 -1
Rad_g_df <- data.frame(T=1:length(Rad_g), X=Rad_g) # Building a data frame with the Rademacher white noise path
head(Rad_g_df)
##   T  X
## 1 1  1
## 2 2 -1
## 3 3  1
## 4 4  1
## 5 5  1
## 6 6 -1
Data_df <- Rad_g_df
length <- nrow(Data_df)
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Density Histogram of the Green Path of the Rademacher White Noise"))
subtitle_content <- bquote("path length" ~ .(length) ~ "sample points" ~~~~~ "path random seed" ~ 23451)
caption_content <- "Author: Roberto Monte"
x_breaks_num <- 2
x_max <- max(Data_df$X)
x_min <- min(Data_df$X)
x_binwidth <- round((x_max-x_min)/x_breaks_num, digits=1)
x_breaks_low <- floor(x_min/x_binwidth)*x_binwidth
x_breaks_up <- ceiling(x_max/x_binwidth)*x_binwidth
x_breaks <- round(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth), digits=1)
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0.5
x_lims <- c((x_breaks_low-J*x_binwidth), (x_breaks_up+J*x_binwidth))
Data_df_dh <- ggplot(Data_df, aes(x=X)) +
  geom_histogram(binwidth = x_binwidth, aes(y=..density..), # bins=2  # density histogram
                 color="black", fill="green", alpha=0.5)+
  scale_x_continuous(name="Sample Data", breaks=x_breaks, labels=x_labs, limits=x_lims) +
  scale_y_continuous(name="Data Density", breaks=waiver(), labels=NULL,
                     sec.axis=sec_axis(~., breaks=waiver(), labels=waiver())) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0))
plot(Data_df_dh)

What this histogram makes you think?

Example 4.5 (Rademacher random walk) Fixed any \(\mathbb{T}\subseteq\mathbb{R}\), consider a Rademacher process \(X_{t}\equiv\mathbf{X}\), given by Equation (4.3) and consider the real random variable \(Y_{t}\) given by \[\begin{equation} Y_{t}\overset{\text{def}}{=}\sum_{s=1}^{t}X_{s}, \quad\forall t\in\left\{1,\dots,T\right\}. \tag{4.4} \end{equation}\] Then the family \(\left(Y_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{Y}\) is a stochastic process with time set \(\mathbb{T}\) and state space \(\mathbb{R}\) referred to as Rademacher random walk.

In the previous chapter we have already built three different paths of the Rademacher random walk with time set \(\mathbb{T}\equiv\left\{1,\dots,150\right\}\).

Assume we plot the density histogram of the blue path of the Rademacher random walk.

length <- 150                                        # Setting the length of the time series.
set.seed(34512)                                      # Setting the random seed "23451" for reproducibility.
Rad_b <- 2*rbinom(n=length, size=1, prob=0.5)-1      # Simulating the flips of a Rademacher fair coin, by sampling 
                                                     # from the standard Bernoulli distribution.
Rad_rw_b <- cumsum(Rad_b)                            # Counting the heads in the simulated flips of a fair coin.
show(Rad_rw_b)
##   [1]   1   0   1   0   1   2   3   2   1   2   3   2   1   2   1   0  -1   0
##  [19]  -1  -2  -1  -2  -1   0   1   2   3   2   1   0  -1  -2  -1   0  -1  -2
##  [37]  -1  -2  -1  -2  -1  -2  -3  -4  -3  -2  -3  -2  -3  -2  -3  -2  -3  -4
##  [55]  -5  -6  -7  -6  -5  -4  -3  -2  -1  -2  -1   0  -1  -2  -3  -4  -5  -6
##  [73]  -7  -6  -5  -4  -5  -6  -7  -8  -9 -10 -11 -10 -11 -10  -9  -8  -9  -8
##  [91]  -7  -6  -7  -8  -9  -8  -9  -8  -9 -10  -9 -10 -11 -10  -9  -8  -7  -8
## [109]  -7  -6  -7  -6  -5  -6  -5  -6  -7  -8  -9  -8  -7  -8  -7  -6  -7  -6
## [127]  -5  -4  -5  -6  -5  -4  -3  -2  -3  -4  -5  -4  -3  -4  -3  -2  -1  -2
## [145]  -3  -2  -3  -2  -1   0
Rad_rw_b_df <- data.frame(T=1:length(Rad_rw_b), X=Rad_rw_b)   # Building a data frame with the Bernoulli counting  path
head(Rad_rw_b_df)
##   T X
## 1 1 1
## 2 2 0
## 3 3 1
## 4 4 0
## 5 5 1
## 6 6 2
Data_df <- Rad_rw_b_df
length <- nrow(Data_df)
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Density Histogram of the Blue Path of the Rademacher Random Walk"))
subtitle_content <- bquote(paste("Data set size = ", .(length),~~"sample points;",~~~~"number of trials parameter ",~ n == 10,~~~~"success parameter ",~ p == 0.5,~"."))
caption_content <- "Author: Roberto Monte"
x_breaks_num <- 7
x_max <- max(Data_df$X)
x_min <- min(Data_df$X)
x_binwidth <- round((x_max-x_min)/x_breaks_num, digits=1)
x_breaks_low <- floor(x_min/x_binwidth)*x_binwidth
x_breaks_up <- ceiling(x_max/x_binwidth)*x_binwidth
x_breaks <- round(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth), digits=0)
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0.5
x_lims <- c((x_breaks_low-J*x_binwidth), (x_breaks_up+J*x_binwidth))
Data_df_dh <- ggplot(Data_df, aes(x=X)) +
  geom_histogram(binwidth = x_binwidth, aes(y=..density..), # bins=2  # density histogram
                 color="black", fill="blue", alpha=0.5)+
  scale_x_continuous(name="Sample Data", breaks=x_breaks, labels=x_labs, limits=x_lims) +
  scale_y_continuous(name="Data Density", breaks=waiver(), labels=NULL,
                     sec.axis=sec_axis(~., breaks=waiver(), labels=waiver())) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0))
plot(Data_df_dh)

Is the Random walk stationary in the mean? Is it stationary in the variance?

Example 4.6 (Standard Binomial Process) For any \(t\in\mathbb{T}\subseteq\mathbb{R}\), consider a binomial random variable \(X_{t}\) with number of trials parameter \(n\) and success probability parameter \(p\), given by \[\begin{equation} X_{t}\overset{\text{def}}{=}k, \quad\mathbf{P}\left(X_{t}=k\right)=\binom{n}{k}, \quad\forall k=0,1,\dots,n,\quad t\in\mathbb{T}. \tag{4.5} \end{equation}\] Then the family \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{X}\) is a stochastic process with time set \(\mathbb{T}\) and state space \(\mathbb{R}\). Assume further that the random variables in \(\mathbf{X}\) are independent. Then \(\mathbf{X}\) is referred to as binomial process with number of trials parameter \(n\) and success probability parameter \(p\). When \(p=1/2\), the binomial process is said to be standard.

In the previous chapter, we have already built three different paths of the standard binomial process with time set \(\mathbb{T}\equiv\left\{1,\dots,150\right\}\) and number of trials parameter \(n=10\), referred to as standard binomial time series.

We plot the density histogram of the red path of the standard binomial process.

length <- 150                                      # Setting the length of the time series.
trial_num <- 10                                    # Setting the number of trials parameters.
p <- 0.5                                           # Setting the success parameter.
set.seed(12345)                                    # Setting the random seed "12345" for reproducibility.
Bin_r <- rbinom(n=length, size=trial_num, prob=p)  # Simulating and showing the draws of n balls from an urn 
                                                   # with replacement, by sampling from the standard binomial 
show(Bin_r)                                        # distribution of "size" n.
##   [1] 6 7 6 7 5 3 4 5 6 9 2 3 6 1 5 5 5 5 4 8 5 4 8 6 6 5 6 5 4 5 6 1 4 6 4 4 7
##  [38] 7 5 3 6 5 7 6 4 4 3 2 3 6 8 6 4 4 6 5 6 3 5 4 6 4 8 6 8 4 8 3 5 8 6 5 4 4
##  [75] 2 5 8 6 5 3 7 5 1 2 3 4 6 5 6 3 7 6 3 5 6 5 6 6 3 5 4 5 8 5 5 7 6 7 4 4 5
## [112] 5 6 5 6 7 8 5 1 4 5 5 3 6 6 6 7 4 4 6 2 6 6 6 7 7 6 4 3 6 5 6 4 6 3 8 9 5
## [149] 7 4
Bin_r_df <- data.frame(T=1:length(Bin_r), X=Bin_r) # Building a data frame with the red binomial path
head(Bin_r_df)
##   T X
## 1 1 6
## 2 2 7
## 3 3 6
## 4 4 7
## 5 5 5
## 6 6 3
Data_df <- Bin_r_df
length <- nrow(Data_df)
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Density Histogram of the Red Path of the Standard Binomial Process"))
subtitle_content <- bquote(paste("Data set size = ", .(length),~~"sample points;",~~~~"number of trials parameter ",~ n == 10,~~~~"success parameter ",~ p == 0.5,~";"~~~~~ "path random seed" ~ 12345~"."))
caption_content <- "Author: Roberto Monte"
x_breaks_num <- 8
x_max <- max(Data_df$X)
x_min <- min(Data_df$X)
x_binwidth <- round((x_max-x_min)/x_breaks_num, digits=1)
x_breaks_low <- floor(x_min/x_binwidth)*x_binwidth
x_breaks_up <- ceiling(x_max/x_binwidth)*x_binwidth
x_breaks <- round(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth), digits=1)
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0.5
x_lims <- c((x_breaks_low-J*x_binwidth), (x_breaks_up+J*x_binwidth))
Data_df_dh <- ggplot(Data_df, aes(x=X)) +
  geom_histogram(binwidth = x_binwidth, aes(y=..density..), # bins=2  # density histogram
                 color="black", fill="red", alpha=0.5)+
  scale_x_continuous(name="Sample Data", breaks=x_breaks, labels=x_labs, limits=x_lims) +
  scale_y_continuous(name="Data Density", breaks=waiver(), labels=NULL,
                     sec.axis=sec_axis(~., breaks=waiver(), labels=waiver())) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0))
plot(Data_df_dh)

Example 4.7 (Standard Gaussian Process) Fixed any \(\mathbb{T}\subseteq\mathbb{R}\), consider the standard Gaussian real random variable \(X_{t}\) with density \(f_{X_{t}}:\mathbb{R}\rightarrow\mathbb{R}\) given by \[\begin{equation} f_{X_{t}}(x)\overset{\text{def}}{=} \frac{1}{\sqrt{2\pi}}e^{-\frac{x^{2}}{2}}, \quad\forall x\in\mathbb{R}, \quad\forall t\in\mathbb{T}. \tag{4.6} \end{equation}\] Then the family \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv X\) is a stochastic process with time set \(\mathbb{T}\) and state space \(\mathbb{R}\). Assume further that the random variables in \(\mathbf{X}\) are independent. Then \(\mathbf{X}\) is referred to as standard Gaussian process.

In the previous chapter, we have already built three different paths of the standard Gaussian process with time set \(\mathbb{T}\equiv\left\{1,\dots,150\right\}\) referred to as Gaussian white noise.

We plot the density histogram of the blue path of the standard Gaussian process.

length <- 150                             # Setting the length of the time series
set.seed(34512)                           # Setting the random seed "34512" for reproducibility.
Gauss_b <- rnorm(n=length, mean=0, sd=1)  # Simulating and showing the sampling from the standard Gaussian distribution.
show(Gauss_b[1:50])
##  [1]  1.81710386  0.55161637  1.11137544  0.13105810 -0.98000415  0.19093294
##  [7] -0.40066585 -0.01188588 -0.29639708 -1.27366316  0.29692993  0.85182847
## [13]  0.59033471  1.97486450 -1.33984935 -0.78122934  0.11834652 -1.05751290
## [19]  0.74990830  0.73138780  0.23231437 -0.19755703  0.63961470 -0.19329883
## [25] -1.19057725 -0.42250101 -0.70632371 -0.50936601 -0.03560262  0.47006425
## [31]  0.70966091  1.30219448  1.72542824 -0.38414591 -0.91376741 -0.88510184
## [37] -1.49360186  1.35874411 -1.69906286 -0.15251601 -1.02279802 -1.06210289
## [43] -0.80895299  0.75961672 -0.57009642  0.56327707 -0.59346337 -0.31246149
## [49] -0.40911979 -0.14574616
Gauss_b_df <- data.frame(T=1:length(Gauss_b), X=Gauss_b) # Building a data frame with the Gaussian path
head(Gauss_b_df)
##   T          X
## 1 1  1.8171039
## 2 2  0.5516164
## 3 3  1.1113754
## 4 4  0.1310581
## 5 5 -0.9800041
## 6 6  0.1909329
Data_df <- Gauss_b_df
length <- nrow(Data_df)
mu <- 0
sigma <- 1
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Density Histogram of the Blue Path of the Standard Gaussian Process"))
subtitle_content <- bquote("path length" ~ .(length) ~ "sample points" ~~~~~ "path random seed" ~ 34512)
caption_content <- "Author: Roberto Monte"
# x_breaks_num <- ceiling(length^(1/2)) # Tukey & Mosteller square-root rule
x_breaks_num <- ceiling(1+log2(length)) # Sturges rule
# x_breaks_num <- ceiling((2*length^(1/3)) # Teller & Scott rice rule
x_max <- max(Data_df$X)
x_min <- min(Data_df$X)
x_binwidth <- round((x_max-x_min)/x_breaks_num, digits=1)
x_breaks_low <- floor(x_min/x_binwidth)*x_binwidth
x_breaks_up <- ceiling(x_max/x_binwidth)*x_binwidth
x_breaks <- round(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth), digits=1)
x_labs <- format(x_breaks, scientific=FALSE)
J <- 0.5
x_lims <- c((x_breaks_low-J*x_binwidth), (x_breaks_up+J*x_binwidth))
col_1 <- bquote("empirical density")
col_2 <- bquote("standard Gaussian density")
leg_labs   <- c(col_1, col_2)
leg_cols   <- c("col_1"="green", "col_2"="red")
leg_breaks <- c("col_1", "col_2")
Data_df_dh <- ggplot(Data_df, aes(x=X)) +
  geom_histogram(binwidth = x_binwidth, aes(y=..density..), # bins=2  # density histogram
                 color="black", fill="blue", alpha=0.5) +
  geom_density(aes(x=X, colour="col_1"), size=0.8, show.legend = FALSE) + 
  stat_function(fun=dnorm, args = list(mean=mu, sd=sigma), aes(x=X, colour="col_2"), size=0.8) +
  scale_x_continuous(name="Sample Data", breaks=x_breaks, labels=x_labs, limits=x_lims) +
  scale_y_continuous(name="Data Density", breaks=waiver(), labels=NULL,
                     sec.axis=sec_axis(~., breaks=waiver(), labels=waiver())) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_breaks,
                      guide=guide_legend(override.aes=list(colour=c("green", "red"), linetype=c("solid", "solid")))) +
   theme(plot.title=element_text(hjust=0.5), 
        plot.subtitle=element_text(hjust=0.5),
        legend.box.background = element_blank(), 
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(Data_df_dh)

Both the stochastic processes presented in Example 4.4 and 4.7 belong to the type of stochastic processes referred to as strong white noises. Roughly speaking the ultimate goal of a time series analysis is to end up with a strong white noise.

4.1 Filtrations

Let \(\left(\Omega,\mathcal{E},\mathbf{P}\right)\equiv\Omega\) be a probability space, let \(\mathbb{T}\) be a time set, and let \(\left(\mathcal{F}_{t}\right)_{t\in\mathbb{T}}\equiv\mathfrak{F}\) be a family of sub-\(\sigma\)-algebras of \(\mathcal{E}\) indexed on \(\mathbb{T}\).

Definition 4.3 (Filtration) We say that \(\mathfrak{F}\) a filtration on \(\Omega\) if we have \[\begin{equation} \mathcal{F}_{s}\subseteq\mathcal{F}_{t},\quad\forall s,t\in\mathbb{T}\text{ s.t. }s<t. \end{equation}\]

Let \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{X}\) be a stochastic process on \(\Omega\) with states in \(\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\), and let \(\left(\mathcal{F}_{t}\right)_{t\in\mathbb{T}}\equiv\mathfrak{F}\) be a filtration on \(\Omega\).

Definition 4.4 (Adatped process) We say that the process \(\mathbf{X}\) is \(\mathfrak{F}\)-adapted if the random variable \(X_{t}\) of the process is \(\mathcal{F}_{t}\)-measurable, for every \(t\in\mathbb{T}\), that is \[\begin{equation} \left\{X_{t}\in B\right\}\in\mathcal{F}_{t}, \end{equation}\] for any \(B\in\mathcal{B}\left(\mathbb{R}^{N}\right)\), for every \(t\in\mathbb{T}\).

Definition 4.5 (Filtration generated by a process) We call the filtration generated by the process \(\mathbf{X}\) the family \(\left(\mathcal{F}_{t}^{\mathbf{X}}\right)_{t\in\mathbb{T}}\equiv\mathfrak{F}^{\mathbf{X}}\) of sub-\(\sigma\)-algebras of \(\mathcal{E}\) given by \[\begin{equation} \mathcal{F}_{t}^{\mathbf{X}}\overset{\text{def}}{=}\sigma\left(X_{s};\ s\leq t\right),\quad\forall t\in\mathbb{T}, \end{equation}\] where \(\sigma\left(X_{s};\ s\leq t\right)\) is the \(\sigma\)-algebra generated by the random variables \(X_{s}\) of the process as \(s\) varies in \(\mathbb{T}\), up to and including \(t\).

Remark (**Filtration generated by a process**). Any process \(\mathbf{X}\) is \(\mathfrak{F}^{\mathbf{X}}\)-adapted. Eventually, \(\mathfrak{F}^{\mathbf{X}}\) is the smallest filtration with respect to which the process \(\mathbf{X}\) is adapted.

The notion of filtration aims to model the information flow progressively available in time to an observer with persistent memory (we assume that the observer does not forget the past). Such an information flow is made of all events which can be discriminated or questions which can be answered by the observer at the current time. Hence, each sub-\(\sigma\)-algebra \(\mathcal{F}_{t}\) of a filtration \(\left(\mathcal{F}_{t}\right)_{t\in\mathbb{T}}\), represents the information available to the observer up to and including the time \(t\). On the other hand, each random variable \(X_{t}\) of a stochastic process represents the possible values taken by a feature of the stochastic phenomenon at the time \(t\). Therefore, the notion of adaptness expresses the ability of the observer to discriminate the possible values the random variable \(X_{t}\) in light of the information \(\mathcal{F}_{t}\) available to her. In particular, the notion of filtration generated by a stochastic process aims to model the minimum information flow which has to be progressively available to an observer to be able to make the desired observations. The notion of filtration plays a crucial role to define the process which is the optimal estimator of a stochastic process.

5 Processes of order \(K\)

Let \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{X}\) be a stochastic process on a probability space \(\left(\Omega,\mathcal{E},\mathbf{P}\right)\equiv\Omega\) with state space \(\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\).

Definition 5.1 (Process of order K) We say that \(\mathbf{X}\) has order \(K\), for some \(K\in\mathbb{N}\), if all random variables in \(\mathbf{X}\) have finite moment of order \(K\).

Proposition 5.1 (Process of order K) If \(\mathbf{X}\) is a process of order \(K\), for some \(K\in\mathbb{N}\), then \(\mathbf{X}\) is a process of order \(J\) for every \(1\leq J\leq K\).

Recall that for an \(N\)-variate real random variable \(X\equiv\left(X_{1},\dots,X_{N}\right)^{\intercal}\) with finite moment of order \(1\), the mean of \(X\) is given by

\[\begin{equation} \mathbf{E}\left[X\right]\overset{\text{def}}{=} \left(\mathbf{E}\left[X_{1}\right],\dots,\mathbf{E}\left[X_{N}\right]\right). \tag{5.1} \end{equation}\]

The mean of the random variable \(X\) is also denoted by \(\mu_{X}\).

Assume that the stochastic process \(\mathbf{X}\) has order \(1\).

Definition 5.2 (Mean function) We call the mean function of \(\mathbf{X}\) the map \(\mu_{\mathbf{X}}:\mathbb{T}\rightarrow\mathbb{R}^{N}\) given by \[\begin{equation} \mu_{\mathbf{X}}\left(t\right)\overset{\text{def}}{=} \mathbf{E}\left[X_{t}\right],\quad\forall t\in\mathbb{T}. \tag{5.2} \end{equation}\]

Example 5.1 (Dirac process mean function) If \(\mathbf{X}\) is a Dirac process (see Example 4.1), then \(\mathbf{X}\) is a process of order \(1\) and we have \[\begin{equation} \mu_{\mathbf{X}}\left(t\right)=x_{t}, \end{equation}\] for every \(t\in\mathbb{T}\).

Example 5.2 (Bernoulli process mean function) If \(\mathbf{X}\) is a Bernoulli process with success probability parameter \(p\) (see Example 4.2), we have \[\begin{equation} \mu_{\mathbf{X}}\left(t\right)=p, \end{equation}\] for every \(t\in\mathbb{T}\).

Example 5.3 (Rademacher process mean function) If \(\mathbf{X}\) is a Rademacher process (see Example 4.4), then \(\mathbf{X}\) is a process of order \(1\) and we have \[\begin{equation} \mu_{\mathbf{X}}\left(t\right)=0, \end{equation}\] for every \(t\in\mathbb{T}\).

Example 5.4 (Standard binomial process mean Function) If \(\mathbf{X}\) is the standard binomial process with success probability parameter \(p\) and number of trials parameter \(n\) (see Example 4.6), then \(\mathbf{X}\) is a process of order \(1\) and we have \[\begin{equation} \mu_{\mathbf{X}}\left(t\right)=np, \end{equation}\] for every \(t\in\mathbb{T}\).

Example 5.5 (Standard Gaussian process mean function) If \(\mathbf{X}\) is the standard Gaussian process (see Example 4.7), then \(\mathbf{X}\) is a process of order \(1\) and we have \[\begin{equation} \mu_{\mathbf{X}}\left(t\right)=0, \end{equation}\] for every \(t\in\mathbb{T}\).

Recall that for any \(N\)-variate real random variable \(X\equiv\left(X_{1},\dots,X_{N}\right)^{\intercal}\) with finite moment of order \(2\), the variance-covariance matrix of \(X\) is given by

\[\begin{equation} Var\left(X\right)\overset{\text{def}}{=} \mathbf{E}\left[\left(X-\mathbf{E}\left[X\right]\right) \left(X-\mathbf{E}\left[X\right]\right)^{\intercal}\right] =\mathbf{E}\left[XX^{\intercal}\right]-\mathbf{E}\left[X\right]\mathbf{E}\left[X\right]^{\intercal}. \tag{5.3} \end{equation}\]

In case \(N=1\), the variance-covariance matrix of the random variable \(X\) is simply called the variance of \(X\) and it is also commonly denoted by \(\mathbf{D}^{2}\left[X\right]\) or \(\sigma_{X}^{2}\).

In case \(N>1\), in terms of the entries \(X_{1},\dots,X_{N}\) of \(X\) we can write

\[\begin{equation} Var\left(X\right)= \begin{pmatrix} Cov\left(X_{1},X_{1}\right) & Cov\left(X_{1},X_{2}\right)& \cdots & Cov\left(X_{1},X_{N-1}\right)& Cov\left(X_{1},X_{N}\right) \\ Cov\left(X_{2},X_{1}\right)& Cov\left(X_{2},X_{2}\right)& \cdots & Cov\left(X_{2},X_{N-1}\right)& Cov\left(X_{2},X_{N}\right) \\ \vdots & \vdots & \ddots & \vdots & \vdots\\ Cov\left(X_{N-1},X_{1}\right)& Cov\left(X_{N-1},X_{2}\right)& \cdots & Cov\left(X_{N-1},X_{N-1}\right)& Cov\left(X_{N-1},X_{N}\right) \\ Cov\left(X_{N},X_{1}\right)& Cov\left(X_{N},X_{2}\right)& \cdots & Cov\left(X_{N},X_{N-1}\right)& Cov\left(X_{N},X_{N}\right) \end{pmatrix}, \tag{5.4} \end{equation}\]

where \[\begin{equation} Cov\left(X_{n},X_{n}\right) \equiv\mathbf{E}\left[\left(X_{n}-\mathbf{E}\left[X_{n}\right]\right)^{2}\right] = Var\left(X_{n}\right) \equiv \mathbf{D}^{2}\left[X_{n}\right] \equiv\sigma^2_{X_{n}} \equiv\sigma^2_{n}, \tag{5.5} \end{equation}\] for every \(n=1,\dots,N\), and

\[\begin{equation} Cov\left(X_{m},X_{n}\right) \equiv\mathbf{E}\left[\left(X_{m}-\mathbf{E}\left[X_{m}\right]\right) \left(X_{n}-\mathbf{E}\left[X_{n}\right]\right)\right] \equiv\sigma_{X_{m},X_{n}} \equiv\sigma_{m,n}, \tag{5.6} \end{equation}\] for all \(m,n=1,\dots,N\) such that \(m\neq n\).

Note that \[\begin{equation} \sigma_{m,n}\equiv Cov\left(X_{m},X_{n}\right)=Cov\left(X_{n},X_{m}\right)\equiv\sigma_{n,m}, \end{equation}\] for all \(m,n=1,\dots,N\), and the matrix \(Var\left(X\right)\) is positive semidefinite.

It is also customary to denote the variance-covariance matrix of the random variable \(X\) by \(\Sigma_{X}\) or \(\left(\sigma_{X_{m},X_{n}}\right)_{m,n=1}^{N}\) or also \(\left(\sigma_{m,n}\right)_{m,n=1}^{N}\).

Note that in case the entries \(X_{1},\dots,X_{N}\) of \(X\) are independent, we have \[\begin{equation} \sigma_{m,n}=0, \tag{5.7} \end{equation}\] for all \(m,n=1,\dots,N\) such that \(m\neq n\). In this case, we obtain

\[\begin{equation} Var\left(X\right)= \begin{pmatrix} \sigma^2_{1} & 0 & \cdots & 0 & 0 \\ 0 & \sigma^2_{2} & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots\\ 0 & 0 & \cdots & \sigma^2_{N-1} & 0 \\ 0 & 0 & \cdots & 0 & \sigma^2_{N} \end{pmatrix}. \tag{5.8} \end{equation}\]

Write \(\operatorname*{diag}Var\left(X\right)\) for the diagonal matrix having for diagonal entries the corresponding entries of \(Var\left(X\right)\), that is, in terms of the entries \(X_{1},\dots,X_{N}\) of \(X\), \[\begin{equation} \operatorname*{diag}Var\left(X\right)\overset{\text{def}}{=} \begin{pmatrix} \sigma^2_{1} & 0 & \cdots & 0 & 0 \\ 0 & \sigma^2_{2} & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots\\ 0 & 0 & \cdots & \sigma^2_{N-1} & 0 \\ 0 & 0 & \cdots & 0 & \sigma^2_{N} \end{pmatrix}. \tag{5.9} \end{equation}\] where \(\sigma^2_{n}\equiv\sigma^2_{X_{n}}\equiv Var\left(X_{n}\right)\), for every \(n=1,\dots,N\), and write \(\det_{N}:\mathbb{R}^{N}\times\mathbb{R}^{N}\rightarrow\mathbb{R}\) for the determinant functional.

Recall that for an \(N\)-variate real random variable \(X\equiv\left(X_{1},\dots,X_{N}\right)^{\intercal}\), with finite moment of order \(2\) and such that \(\det_{N}\left(\operatorname*{diag}Var(X)\right)\neq 0\), the correlation matrix of \(X\) is given by \[\begin{equation} Corr\left(X\right)\overset{\text{def}}{=} \left(\operatorname*{diag}Var\left(X\right)\right)^{-\frac{1}{2}} Var\left(X\right) \left(\operatorname*{diag}Var\left(X\right)\right)^{-\frac{1}{2}}. \end{equation}\]

In case \(N=1\), the correlation matrix of \(X\) is simply \(1\).

In case \(N>1\), in terms of the entries \(X_{1},\dots,X_{N}\) of \(X\), we can write

\[\begin{equation} Corr\left(X\right)= \begin{pmatrix} 1 & Corr\left(X_{1},X_{2}\right) & \cdots & Corr\left(X_{1},X_{N-1}\right) & Corr\left(X_{1},X_{N}\right) \\ Corr\left(X_{2},X_{1}\right) & 1 & \cdots & Corr\left(X_{2},X_{N-1}\right) & Corr\left(X_{2},X_{N}\right) \\ \vdots & \vdots & \ddots & \vdots & \vdots\\ Corr\left(X_{N-1},X_{1}\right) & Corr\left(X_{N-1},X_{2}\right) & \cdots & 1 & Corr\left(X_{N-1},X_{N}\right) \\ Corr\left(X_{N},X_{1}\right) & Corr\left(X_{N},X_{2}\right) & \cdots & Corr\left(X_{N},X_{N-1}\right) & 1 \end{pmatrix}, \end{equation}\] where \[\begin{equation} Corr\left(X_{m},X_{n}\right)\equiv\frac{Cov\left(X_{m},X_{n}\right)} {Var\left(X_{m}\right)^{\frac{1}{2}}Var\left(X_{n}\right)^{\frac{1}{2}}} \equiv\rho_{X_{m},X_{n}} \equiv\rho_{m,n}, \end{equation}\] for all \(m,n=1,\dots,N\) such that \(m\neq n\).

Note that \[\begin{equation} \rho_{m,n}\equiv Corr\left(X_{m},X_{n}\right) =Corr\left(X_{n},X_{m}\right)\equiv\rho_{n,m}, \end{equation}\] for all \(m,n=1,\dots,N\), and the matrix \(Corr\left(X\right)\) is positive definite.

It is also customary to denote the correlation matrix of the random variable \(X\) by \(\mathrm{P}_{X}\) or \(\left(\rho_{X_{m},X_{n}}\right)_{m,n=1}^{N}\) or also \(\left(\rho_{m,n}\right)_{m,n=1}^{N}\).

Note that in case the entries \(X_{1},\dots,X_{N}\) of \(X\) are independent, we have \[\begin{equation} \rho_{m,n}=0, \tag{5.10} \end{equation}\] for all \(m,n=1,\dots,N\) such that \(m\neq n\). In this case, we obtain

\[\begin{equation} Corr\left(X\right)= \begin{pmatrix} 1 & 0 & \cdots & 0 & 0 \\ 0 & 1 & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots\\ 0 & 0 & \cdots & 1 & 0 \\ 0 & 0 & \cdots & 0 & 1 \end{pmatrix}. \tag{5.11} \end{equation}\]

Assume that the stochastic process \(\mathbf{X}\) has order \(2\).

Definition 5.3 (Variance-covariance function) We call the variance-covariance function of \(\mathbf{X}\) the map \(\Sigma_{\mathbf{X}}^{2}:\mathbb{T}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) given by \[\begin{equation} \Sigma_{\mathbf{X}}^{2}\left(t\right)\overset{\text{def}}{=} Var\left(X_{t}\right),\quad\forall t\in\mathbb{T}, \tag{5.12} \end{equation}\] where \(Var\left(X_{t}\right)\) is the variance-covariance matrix of the \(N\)-variate real random variable \(X_{t}\equiv\left(X_{t}^{(1)},\dots,X_{t}^{(N)}\right)\), for every \(t\in\mathbb{T}\). In case \(N=1\), the variance-covariance function of \(\mathbf{X}\) is simply called the variance function of \(\mathbf{X}\) and, as it is customary, we denote it by \(\sigma^{2}_{\mathbf{X}}\) rather than \(\Sigma_{\mathbf{X}}^{2}\).

Assume that \(\det_{N}\left(\operatorname*{diag}\Sigma_{\mathbf{X}}^{2}\left(t\right)\right)\neq 0\), for every \(t\in\mathbb{T}\), where \(\operatorname*{diag}\Sigma_{\mathbf{X}}^{2}\left(t\right)\) is the diagonal matrix having for diagonal entries the corresponding entries of \(\Sigma_{\mathbf{X}}^{2}\left(t\right)\), that is, in terms of the entries \(X_{t}^{(1)},\dots,X_{t}^{(N)}\) of \(X_{t}\), \[\begin{equation} \operatorname*{diag}\Sigma_{\mathbf{X}}^{2}\left(t\right) \equiv \begin{pmatrix} \sigma^{2}_{X_{t}^{(1)}} & 0 & \cdots & 0 & 0 \\ 0 & \sigma^{2}_{X_{t}^{(2)}} & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots\\ 0 & 0 & \cdots & \sigma^{2}_{X_{t}^{(N-1)}} & 0 \\ 0 & 0 & \cdots & 0 & \sigma^{2}_{X_{t}^{(N)}} \end{pmatrix}. \end{equation}\]

Definition 5.4 (Correlation function) We call the correlation function of \(\mathbf{X}\) the map \(\mathrm{P}_{\mathbf{X}}:\mathbb{T}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) given by \[\begin{equation} \mathrm{P}_{\mathbf{X}}\left(t\right)\overset{\text{def}}{=} Corr\left(X_{t}\right),\quad \forall t\in\mathbb{T}. \tag{5.13} \end{equation}\]

Recall that given a real random variable \(X:\Omega\rightarrow\mathbb{R}^{M}\), for some \(M\in\mathbb{N}\), and a random variable \(Y:\Omega\rightarrow\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\), having both finite moment of order \(2\), the covariance matrix of \(X\) and \(Y\) is the matrix in \(\mathbb{R}^{M\times N}\) given by \[\begin{equation} Cov\left(X,Y\right)\overset{\text{def}}{=} \mathbf{E}\left[\left(X-\mathbf{E}\left[X\right]\right) \left(Y-\mathbf{E}\left[Y\right]\right)^{\intercal}\right]. \end{equation}\]

Note that in case \(M=N=1\), we have \[\begin{equation} Cov\left(X,Y\right)\overset{\text{def}}{=} \mathbf{E}\left[\left(X-\mathbf{E}\left[X\right]\right) \left(Y-\mathbf{E}\left[Y\right]\right)\right]. \end{equation}\] Hence, the covariance matrix of \(X\) and \(Y\) reduces to the covariance of \(X\) and \(Y\).

In general, setting \(X\equiv\left(X_{1},\dots,X_{M}\right)^{\intercal}\) and \(Y\equiv\left(Y_{1},\dots,Y_{N}\right)^{\intercal}\), we have \[\begin{equation} Cov\left(X,Y\right)= \begin{pmatrix} Cov\left(X_{1},Y_{1}\right) & Cov\left(X_{1},Y_{2}\right) & \cdots & Cov\left(X_{1},Y_{N-1}\right) & Cov\left(X_{1},Y_{N}\right) \\ Cov\left(X_{2},Y_{1}\right) & Cov\left(X_{2},Y_{2}\right) & \cdots & Cov\left(X_{2},Y_{N}\right) & Cov\left(X_{2},Y_{N}\right) \\ \vdots & \vdots & \ddots & \vdots & \vdots\\ Cov\left(X_{M-1},Y_{1}\right) & Cov\left(X_{M-1},Y_{2}\right) & \cdots & Cov\left(X_{M-1},Y_{N-1}\right) & Cov\left(X_{M-1},Y_{N}\right) \\ Cov\left(X_{M},Y_{1}\right) & Cov\left(X_{M},Y_{2}\right) & \cdots & Cov\left(X_{M-1},Y_{N}\right) & Cov\left(X_{M},Y_{N}\right) \end{pmatrix}, \end{equation}\] where \[\begin{equation} Cov\left(X_{m},Y_{n}\right)\equiv \mathbf{E}\left[\left(X_{m}-\mathbf{E}\left[X_{m}\right]\right) \left(Y_{n}-\mathbf{E}\left[Y_{n}\right]\right)\right] \equiv\sigma_{X_{m},Y_{n}} \equiv\sigma_{m,n}, \end{equation}\] for all \(m=1,\dots,M,\ n=1,\dots,N\).

It is also customary to denote the covariance matrix of \(X\) and \(Y\) by \(\Sigma_{X,Y}\) or \(\left(\sigma_{X_{m},Y_{n}}\right)_{m=1,n=1}^{M,N}\) or \(\left(\sigma_{m,n}\right)_{m=1,n=1}^{M,N}\).

Recall also that the random variables \(X\) and \(Y\) are said to be uncorrelated if \[\begin{equation} Cov\left(X,Y\right)=0, \end{equation}\] where \(0\) is intended to be the zero matrix in \(\mathbb{R}^{M\times N}\).

Still assume that the stochastic process \(\mathbf{X}\) has order \(2\).

Definition 5.5 (Autocovariance Function) We call the autocovariance function of \(\mathbf{X}\) the map \(\Gamma_{\mathbf{X}}:\mathbb{T}\times\mathbb{T}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) given by \[\begin{equation} \Gamma_{\mathbf{X}}\left(s,t\right)\overset{\text{def}}{=} Cov\left(X_{s},X_{t}\right), \quad\forall s,t\in\mathbb{T}. \tag{5.14} \end{equation}\] where \(Cov\left(X_{s},X_{t}\right)\) is the covariance of the \(N\)-variate real random variables \(X_{s}\) and \(X_{t}\) in the process \(\mathbf{X}\). In case \(N=1\), as it is customary, we denote the autocovariance function of \(\mathbf{X}\) by \(\gamma_{\mathbf{X}}\) rather than \(\Gamma_{\mathbf{X}}\).

Definition 5.6 (Not autocorrelated stochastic process) We say that the process \(\mathbf{X}\) is not autocorrelated if we have \[\begin{equation} \Gamma_{\mathbf{X}}\left(s,t\right)=0, \tag{5.15} \end{equation}\] for all \(s,t\in\mathbb{T}\) such that \(s\neq t\).

A stochastic process \(\mathbf{X}\) is not autocorrelated whenever the random variables \(X_{s}\) and \(X_{t}\) in \(\mathbf{X}\) are uncorrelated for all \(s,t\in\mathbb{T}\) such that \(s\neq t\).

Assume that \(\det_{N}\left(\operatorname*{diag}Var\left(X_{t}\right)\right)\neq 0\), for every \(t\in\mathbb{T}\), where, as above, \(\det_{N}:\mathbb{R}^{N}\times\mathbb{R}^{N}\rightarrow\mathbb{R}\) is the determinant function and \(\operatorname*{diag}Var\left(X_{t}\right)\) is the diagonal matrix in \(\mathbb{R}^{N}\times\mathbb{R}^{N}\) having for diagonal entries the corresponding diagonal entries of \(Var\left(X_{t}\right)\).

Definition 5.7 (Autocorrelation function) We call the autocorrelation function of \(\mathbf{X}\) the map \(\mathrm{P}_{\mathbf{X}}:\mathbb{T}\times\mathbb{T}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) given by \[\begin{equation} \mathrm{P}_{\mathbf{X}}\left(s,t\right)\overset{\text{def}}{=} Corr\left(X_{s},X_{t}\right), \quad\forall s,t\in\mathbb{T}, \tag{5.16} \end{equation}\] where \(Corr\left(X_{s},X_{t}\right)\) is the autocorrelation of the \(N\)-variate real random variables \(X_{s}\) and \(X_{t}\) in \(\mathbf{X}\). In case \(N=1\), as it is customary, we denote the autocorrelation function of \(\mathbf{X}\) by \(\rho_{\mathbf{X}}\) rather than \(\mathrm{P}_{\mathbf{X}}\).

We recall that given a real random variable \(X:\Omega\rightarrow\mathbb{R}^{M}\), for some \(M\in\mathbb{N}\), and a random variable \(Y:\Omega\rightarrow\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\), having both finite moment of order \(2\) and such that \(\det_{M}\left(\operatorname*{diag}Var\left(X\right)\right) \det_{N}\left(\operatorname*{diag}Var\left(Y\right)\right)\neq 0\), where \(\det_{M}\) [resp. \(\det_{N}\)] is the determinant function on \(\mathbb{R}^{M^{2}}\) [resp \(\mathbb{R}^{N}\times\mathbb{R}^{N}\)] and \(\operatorname*{diag}Var\left(X\right)\) [resp. \(\operatorname*{diag}Var\left(Y\right)\)] is the diagonal matrix in \(\mathbb{R}^{M^{2}}\) [resp. \(\mathbb{R}^{N}\times\mathbb{R}^{N}\)] having for diagonal entries the corresponding diagonal entries of \(Var\left(X\right)\) [resp. \(Var\left(Y\right)\)], the correlation matrix of \(X\) and \(Y\) is the matrix in \(\mathbb{R}^{M\times N}\) given by \[\begin{equation} Corr\left(X,Y\right)\overset{\text{def}}{=} \left(\operatorname*{diag}Var\left(X\right)\right)^{-\frac{1}{2}} Cov\left(X,Y\right) \left(\operatorname*{diag}Var\left(Y\right)\right)^{-\frac{1}{2}}. \tag{5.17} \end{equation}\]

Note that in the case \(M=N=1\), we have \[\begin{equation} Corr\left(X,Y\right)\overset{\text{def}}{=} \frac{Cov\left(X,Y\right)}{Var\left(X\right)^{\frac{1}{2}}Var\left(Y\right)^{\frac{1}{2}}}. \tag{5.18} \end{equation}\]

We recall also that if the random variables are uncorrelated, then

\[\begin{equation} Corr\left(X,Y\right)=0, \end{equation}\] where \(0\) is intended to be the zero matrix in \(\mathbb{R}^{M\times N}\).

Therefore,

Proposition 5.2 (Not autocorrelated stochastic process) If the stochastic process \(\mathbf{X}\) is not autocorrelated, then we have \[\begin{equation} \mathrm{P}_{\mathbf{X}}\left(s,t\right)=0, \tag{5.19} \end{equation}\] for all \(s,t\in\mathbb{T}\) such that \(s\neq t\).

The autocovariance and autocorrelation functions express the temporal “linear dependence” of the random variables in a stochastic process.

For simplicity, assume now that we have \(\mathbb{T}\equiv\mathbb{Z}\), where \(\mathbb{T}\) is the time set of the process \(\mathbf{X}\).

Definition 5.8 (Partial autocovariance function) We call the partial autocovariance function of \(\mathbf{X}\) the map \(\Phi_{\mathbf{X}}:\mathbb{Z}\times\mathbb{Z}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) given by \[\begin{equation} \Phi_{\mathbf{X}}\left(s,t\right)\overset{\text{def}}{=}\left\{ \begin{array} [c]{ll} Var(X_{t}) & \text{if }s=t\\ Cov\left(X_{t-1},X_{t}\right) & \text{if }s=t-1\\ Cov\left(X_{t+1},X_{t}\right) & \text{if }s=t+1\\ Cov\left(X_{s}-\mathbf{P}\left[X_{s}\mid X_{s+1},\dots,X_{t-1}\right] ,X_{t}-\mathbf{P}\left[X_{t}\mid X_{s+1},\dots,X_{t-1}\right]\right)& \text{if }s<t\\ Cov\left(X_{s}-\mathbf{P}\left[X_{s}\mid X_{t+1},\dots,X_{s-1}\right] ,X_{t}-\mathbf{P}\left[X_{t}\mid X_{t+1},\dots,X_{s-1}\right]\right)& \text{if }s>t \end{array} \right. \tag{5.20} \end{equation}\] where \(\mathbf{P}\left[X_{s}\mid X_{s+1},\dots,X_{t-1}\right]\) and \(\mathbf{P}\left[X_{s}\mid X_{t+1},\dots,X_{s-1}\right]\) [resp. \(\mathbf{P}\left[X_{t}\mid X_{s+1},\dots,X_{t-1}\right]\) and \(\mathbf{P}\left[X_{t}\mid X_{t+1},\dots,X_{s-1}\right]\)] are the orthogonal projection of \(X_{s}\) [resp. \(X_{t}\)] on the subspace of \(L^{2}\left(\Omega;\mathbb{R}^{N}\right)\) generated by the random variables \(X_{s+1},\dots,X_{t-1}\) and \(X_{t+1},\dots,X_{s-1}\), respectively. In case \(N=1\), we denote the partial autocovariance function of \(\mathbf{X}\) by \(\phi_{\mathbf{X}}\) rather than \(\Phi_{\mathbf{X}}\).

Remark (Partial autocovariance function in case N=1). In case \(N=1\), we have \[\begin{equation} \phi_{\mathbf{X}}\left(s,t\right)=\phi_{\mathbf{X}}\left(t,s\right), \end{equation}\] for all \(s,t\in\mathbb{T}\) such that \(s\neq t\).

Proposition 5.3 (Partial autococovariance function in case N=1) In case \(N=1\), consider \(s<t\) and set \[\begin{equation} \Sigma_{X_{s+1},\dots,X_{t-1}} \equiv\mathbf{E}\left[\left(X_{s+1}-\mathbf{E}\left[X_{s+1}\right],\dots, X_{t-1}-\mathbf{E}\left[X_{t-1}\right]\right)^{\intercal} \left(X_{s+1}-\mathbf{E}\left[X_{s+1}\right],\dots,X_{t-1}-\mathbf{E}\left[X_{t-1}\right]\right)\right], \end{equation}\] \[\begin{equation} \Gamma_{X_{s},\left(X_{s+1},\dots,X_{t-1}\right)} \equiv\left(\mathbf{E}\left[\left(X_{s}-\mathbf{E}\left[X_{s}\right]\right) \left(X_{s+1}-\mathbf{E}\left[X_{s+1}\right]\right)\right],\dots, \mathbf{E}\left[\left(X_{s}-\mathbf{E}\left[X_{s}\right]\right) \left(X_{t-1}-\mathbf{E}\left[X_{t-1}\right]\right)\right]\right) \end{equation}\] \[\begin{equation} \text{[resp. }\Gamma_{X_{t},\left(X_{s+1},\dots,X_{s-1}\right)} \equiv\left(\mathbf{E}\left[\left(X_{t}-\mathbf{E}\left[X_{t}\right]\right) \left(X_{s+1}-\mathbf{E}\left[X_{s+1}\right]\right)\right],\dots, \mathbf{E}\left[\left(X_{s}-\mathbf{E}\left[X_{s}\right]\right) \left(X_{t-1}-\mathbf{E}\left[X_{t-1}\right]\right)\right]\right)\text{]}. \end{equation}\] If the matrix \(\Sigma_{X_{t+1},\dots,X_{t+\tau-1}}\) is non singular, we have \[\begin{equation} \mathbf{P}\left[X_{s}\mid X_{s+1},\dots,X_{t-1}\right] =\Gamma_{X_{s},\left(X_{s+1},\dots,X_{t-1}\right)} \Sigma_{X_{s+1},\dots,X_{t-1}}^{-1}\left(X_{t+1},\dots,X_{t+\tau-1}\right)^{\intercal} \end{equation}\] \[\begin{equation} \text{[resp. }\mathbf{P}\left[X_{t}\mid X_{s+1},\dots,X_{t-1}\right] =\Gamma_{X_{t},\left(X_{s+1},\dots,X_{t-1}\right)} \Sigma_{X_{s+1},\dots,X_{t-1}}^{-1}\left(X_{s+1},\dots,X_{t-1}\right)^{\intercal}\text{]}. \end{equation}\]

Definition 5.9 (Partial autocorrelation function) Assume that \(\det\left(\operatorname*{diag}Var\left(X_{s}\right)\right) \det\left(\operatorname*{diag}Var\left(X_{t}\right)\right)\neq 0\), for all \(s,t\in\mathbb{T}\). Then, we call the partial autocorrelation function of \(\mathbf{X}\) the map \(\Psi_{\mathbf{X}}:\mathbb{Z}\times\mathbb{Z}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) given by \[\begin{equation} \Psi_{\mathbf{X}}\left(s,t\right)\overset{\text{def}}{=}\left\{ \begin{array} [c]{ll} 1 & \text{if }s=t\\ Corr\left(X_{t-1},X_{t}\right) & \text{if }s=t-1\\ Corr\left(X_{t+1},X_{t}\right) & \text{if }s=t+1\\ Corr\left(X_{s}-\mathbf{P}\left[X_{s}\mid X_{s+1},\dots,X_{t-1}\right] ,X_{t}-\mathbf{P}\left[X_{t}\mid X_{s+1},\dots,X_{t-1}\right]\right)& \text{if }s<t\\ Corr\left(X_{s}-\mathbf{P}\left[X_{s}\mid X_{t+1},\dots,X_{s-1}\right] ,X_{t}-\mathbf{P}\left[X_{t}\mid X_{t+1},\dots,X_{s-1}\right]\right)& \text{if }s>t \end{array} \right. \tag{5.20} \end{equation}\] where, as in Definition 5.8, \(\mathbf{P}\left[X_{s}\mid X_{s+1},\dots,X_{t-1}\right]\) and \(\mathbf{P}\left[X_{s}\mid X_{t+1},\dots,X_{s-1}\right]\) [resp. \(\mathbf{P}\left[X_{t}\mid X_{s+1},\dots,X_{t-1}\right]\) and \(\mathbf{P}\left[X_{t}\mid X_{t+1},\dots,X_{s-1}\right]\)] are the orthogonal projection of \(X_{s}\) [resp. \(X_{t}\)] on the subspace of \(L^{2}\left(\Omega;\mathbb{R}^{N}\right)\) generated by the random variables \(X_{s+1},\dots,X_{t-1}\) and \(X_{t+1},\dots,X_{s-1}\), respectively, and, according to Equation (5.17) \[\begin{equation} Corr\left(X,Y\right)\overset{\text{def}}{=} \left(\operatorname*{diag}Var\left(X\right)\right)^{-\frac{1}{2}} Cov\left(X,Y\right) \left(\operatorname*{diag}Var\left(Y\right)\right)^{-\frac{1}{2}}. \end{equation}\] In case \(N=1\), we denote the partial autocovariance function of \(\mathbf{X}\) by \(\psi_{\mathbf{X}}\) rather than \(\Psi_{\mathbf{X}}\).

Remark (Partial autocorrelation function in case N=1). In case \(N=1\), we have \[\begin{equation} \psi_{\mathbf{X}}\left(s,t\right)=\psi_{\mathbf{X}}\left(t,s\right), \end{equation}\] for all \(s,t\in\mathbb{T}\) such that \(s\neq t\) and \(\sigma_{X_{s}}\sigma_{X_{t}}\neq 0\).

6 Strong-Sense Stationary (SSS) Processes

Let \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{X}\) be a stochastic process on a probability space \(\left(\Omega,\mathcal{E},\mathbf{P}\right)\equiv\Omega\) with state space \(\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\).

Definition 6.1 (Multi-index) Given any \(L\in\mathbb{N}\), we call a multi-index of length \(L\) on \(\mathbb{T}\) any sequence \(\left(t_{\ell}\right)_{\ell=1}^{L}\) of distinct elements of \(\mathbb{T}\). A multi-index on \(\mathbb{T}\) is said to be increasing if \(t_{\ell}<t_{\ell+1}\), for every \(\ell=1,\dots,L-1\). A multi-index on \(\mathbb{T}\) is also called a time multi-index when no ambiguity and arise about the time set.

Any multi-index [resp. increasing multi-index] on \(\mathbb{T}\) can be identified with a finite permutation [resp. combination] of the elements of \(\mathbb{T}\).

We write \(\mathcal{P}_{fin}\left(\mathbb{T}\right)\) [resp. \(\mathcal{C}_{fin}\left(\mathbb{T}\right)\)] for the set of all multi-indices [resp. increasing multi-indices] on \(\mathbb{T}\).

Definition 6.2 (Shifted multi-index) Given any \(\mathbf{t}\in\mathcal{C}_{fin}\left(\mathbb{T}\right)\), such that \(\mathbf{t}\equiv\left(t_{\ell}\right)_{\ell=1}^{L}\) for some \(L\in\mathbb{N}\), and any \(\tau\in\mathbb{R}\), we call \(\tau\)-shift of \(\mathbf{t}\) the time multi-index \(\mathbf{t}_{\tau}\) given by \[\begin{equation} \mathbf{t}_{\tau}\overset{\text{def}}{=}\left(t_{1}+\tau,\dots,t_{L}+\tau\right) \equiv\left(t_{\ell}+\tau\right)_{\ell=1}^{L}. \tag{6.1} \end{equation}\] Note that in Time Series literature the time multi-index \(\mathbf{t}_{\tau}\) is often referred to as \(\tau\)-lag of \(\mathbf{t}\).

Definition 6.3 (Strong-sense stationary (SSS) processes) We say that the process \(\mathbf{X}\) is strong-sense stationary (SSS) or strongly stationary if we have \[\begin{equation} \mathbf{P}\left(X_{t_{1}}\leq x_{1},\dots,X_{t_{L}}\leq x_{L}\right) =\mathbf{P}\left(X_{t_{1}+\tau}\leq x_{1},\dots,X_{t_{L}+\tau}\leq x_{L}\right), \tag{6.2} \end{equation}\] for every \(\mathbf{t}\in\mathcal{C}_{fin}\left(\mathbb{T}\right)\), such that \(\mathbf{t}\equiv\left(t_{\ell}\right)_{\ell=1}^{L}\) for some \(L\in\mathbb{N}\), every \(\tau\in\mathbb{R}\), such that \(\mathbf{t}_{\tau}\in\mathcal{C}_{fin}\left(\mathbb{T}\right)\), and every \(\left(x_{1},\dots,x_{L}\right)\in\mathsf{X}_{\ell=1}^{L}\mathbb{R}^{N}\).

Proposition 6.1 (Distribution of random vectors in a SSS process) The process \(\mathbf{X}\) is an SSS process if and only if the \(L\)-dimensional vectors of \(N\)-variate real random variables \(\left(X_{t_{1}},\dots,X_{t_{L}}\right)\) and \(\left(X_{t_{1}+\tau},\dots,X_{t_{L}+\tau}\right)\) are identically distributed, for every \(\mathbf{t}\in\mathcal{C}_{fin}\left(\mathbb{T}\right)\), such that \(\mathbf{t}\equiv\left(t_{\ell}\right)_{\ell=1}^{L}\) for some \(L\in\mathbb{N}\), and every \(\tau\in\mathbb{R}\), such that \(\mathbf{t}_{\tau}\in\mathcal{C}_{fin}\left(\mathbb{T}\right)\), .

Corollary 6.1 (Distribution of random variables in a SSS process) If \(\mathbf{X}\) is an SSS process, then the \(N\)-variate real random variables in \(\mathbf{X}\) are identically distributed. However, the converse is not true.

Example 6.1 (Not SSS process with identically distributed random variables) Consider the discrete real random variables \(Y\) and \(Z\) given by the following distribution table \[\begin{equation} \begin{array} [c]{cccc} Y/Z & 0 & 1 & 2\\ 0 & 0 & 1/7 & 1/7\\ 1 & 2/7 & 0 & 1/7\\ 2 & 0 & 2/7 & 0 \end{array} \end{equation}\] we have \[\begin{equation} \mathbf{P}\left(Y=0,Z=1\right)=1/7\quad\text{and}\quad\mathbf{P}\left(Z=0,Y=1\right)=2/7. \end{equation}\] Therefore, the \(2\)-variate real random variables \(\left(Y,Z\right)\) and \(\left(Z,Y\right)\) have different distributions. On the other hand, the distributions of \(Y\) and \(Z\) are given by \[\begin{equation} \mathbf{P}\left(Y=0\right)=2/7,\quad\mathbf{P}\left(Y=1\right)=3/7,\quad\mathbf{P}\left(Y=2\right)=2/7 \end{equation}\] and \[\begin{equation} \mathbf{P}\left(Z=0\right)=2/7,\quad\mathbf{P}\left(Z=1\right)=3/7,\quad\mathbf{P}\left(Z=2\right)=2/7. \end{equation}\] Hence, the random variables \(Y\) and \(Z\) have the same distribution. Now, consider the real stochastic process \(\left(X_{t}\right)_{t=1}^{3}\equiv\mathbf{X}\) given by \[\begin{equation} X_{1}\overset{\text{def}}{=}Y,\quad X_{2}\overset{\text{def}}{=}Z,\quad X_{3}\overset{\text{def}}{=}Y. \end{equation}\] The random variables in \(\mathbf{X}\) have the same distribution but the distribution of \(\left(X_{1},X_{2}\right)=\left(Y,Z\right)\) is different than the distribution of \(\left(X_{1+1},X_{2+1}\right)=\left(X_{2},X_{3}\right)=\left(Z,Y\right)\). This prevents the process \(\mathbf{X}\) from being strong-sense stationary. Note that \[\begin{equation} \mathbf{E}\left[Y\right]=1,\quad\mathbf{E}\left[Z\right]=1, \end{equation}\] Furthermore, since \[\begin{align} \mathbf{P}\left(YZ=0\right)& =\mathbf{P}\left(Y=0\vee Z=0\right) =\mathbf{P}\left(Y=0\right)+\mathbf{P}\left(Y=0\right)-\mathbf{P}\left(Y=0,Z=0\right)=4/7,\\ \mathbf{P}\left(YZ=1\right)& =\mathbf{P}\left(Y=1,Z=1\right)=0,\\ \mathbf{P}\left(YZ=2\right)& =\mathbf{P}\left(Y=1,Z=2\vee Y=2,Z=1\right) =\mathbf{P}\left(Y=1,Z=2\right)+\mathbf{P}\left(Y=2,Z=1\right)=3/7,\\ \mathbf{P}\left(YZ=4\right)& =\mathbf{P}\left(Y=2,Z=2\right)=0, \end{align}\] we have \[\begin{equation} \mathbf{E}\left[YZ\right]=6/7. \end{equation}\] It follows, \[\begin{equation} \mathbf{E}\left[YZ\right]-\mathbf{E}\left[Y\right]\mathbf{E}\left[Z\right]=-1/7. \end{equation}\] This shows that \(Y\) and \(Z\) are not uncorrelated. A fortiori, \(Y\) and \(Z\) are not independent. This clearly implies that the random variables in the process \(\mathbf{X}\) are not independent.

Corollary 6.2 (Moments of random variables in a SSS process of order K) If \(\mathbf{X}\) is a SSS process of order \(K\), then the moments of \(\mathbf{X}\) of all orders up to the \(K\)th included are time invariant.

Corollary 6.3 (Independent and identically distributed SSS process) Assume that the random variables in the process \(\mathbf{X}\) are independent and identically distributed. Then \(\mathbf{X}\) is an SSS process.

Proof. Thanks to the independence and identically distribution we can write \[\begin{align} \mathbf{P}\left(X_{t_{1}}\leq x_{1},\dots,X_{t_{L}}\leq x_{L}\right) =\mathbf{P}\left(X_{t_{1}}\leq x_{1}\right)\cdots\mathbf{P}\left(X_{t_{L}}\leq x_{L}\right)\\ =\mathbf{P}\left(X_{t_{1}+\tau}\leq x_{1}\right)\cdots\mathbf{P}\left(X_{t_{L}+\tau}\leq x_{L}\right)\\ =\mathbf{P}\left(X_{t_{1}+\tau}\leq x_{1},\dots,X_{t_{L}+\tau}\leq x_{L}\right), \end{align}\] for every \(\mathbf{t}\in\mathcal{C}_{fin}\left(\mathbb{T}\right)\), such that \(\mathbf{t}\equiv\left(t_{\ell}\right)_{\ell=1}^{L}\) for some \(L\in\mathbb{N}\), every \(\tau\in\mathbb{R}\), such that \(\mathbf{t}_{\tau}\in\mathcal{C}_{fin}\left(\mathbb{T}\right)\), and every \(\left(x_{1},\dots,x_{L}\right)\in\mathsf{X}_{\ell=1}^{L}\mathbb{R}^{N}\). This proves the desired result.

Corollary 6.3 presents a simple and important family of SSS processes. However, independence of the random variables in the process \(\mathbf{X}\) is not a necessary condition for \(\mathbf{X}\) being strong-sense stationary.

Example 6.2 (Not independent and identically distributed SSS process) Let \(X\) be an \(N\)-variate real random variable on a probability space \(\Omega\) with finite moment of order \(2\). Choose any \(\mathbb{T}\subseteq\mathbb{R}\) and consider the stochastic process \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{X}\) given by \[\begin{equation} X_{t}\overset{\text{def}}{=}X,\quad\forall t\in\mathbb{T}. \end{equation}\] Then the random variables in \(\mathbf{X}\) are not uncorrelated and the process \(\mathbf{X}\) is strongly stationary. In fact, we have \[\begin{equation} \mu_{\mathbf{X}}\left(t\right)=\mathbf{E}\left[X_{t}\right]=\mathbf{E}\left[X\right]\equiv\mu_{X} \quad\text{and}\quad \Sigma_{\mathbf{X}}^{2}\left(t\right)=Var\left(X_{t}\right)=Var\left(X\right)\equiv\Sigma_{X} \end{equation}\] for every \(t\in\mathbb{T}\). Furthermore, \[\begin{equation} \gamma_{X}\left(s,t\right)=Cov\left(X_{s},X_{t}\right)=Cov\left(X,X\right) =Var\left(X\right)\equiv\Sigma_{X}, \end{equation}\] for all \(s,t\in\mathbb{T}\). Therefore, the process \(\mathbf{X}\) has constant mean, constant variance-covariance, and constant autocovariance function. In particular, as a consequence of the constant non-zero autocovariance function, the random variables in \(\mathbf{X}\) are not uncorrelated. On the other hand, we clearly have \[\begin{equation} \mathbf{P}\left(X_{t_{1}}\leq x_{1},\dots,X_{t_{L}}\leq x_{L}\right) =\mathbf{P}\left(X\leq x_{1},\dots,X\leq x_{L}\right) =\mathbf{P}\left(X_{t_{1}+\tau}\leq x_{1},\dots,X_{t_{L}+\tau}\leq x_{n+\tau}\right), \end{equation}\] for every \(\mathbf{t}\in\mathcal{C}_{fin}\left(\mathbb{T}\right)\), such that \(\mathbf{t}\equiv\left(t_{\ell}\right)_{\ell=1}^{L}\) for some \(L\in\mathbb{N}\), every \(\tau\in\mathbb{R}\), such that \(\mathbf{t}_{\tau}\in\mathcal{C}_{fin}\left(\mathbb{T}\right)\), and every \(\left(x_{1},\dots,x_{L}\right)\in\mathsf{X}_{\ell=1}^{L}\mathbb{R}^{N}\). This yields the strong stationarity of \(\mathbf{X}\). Note that for every \(\omega\in\Omega\) we have \[\begin{equation} X_{t}\left(\omega\right)=X\left(\omega\right), \end{equation}\] on varying of \(t\in\mathbb{T}\). Hence, given any \(\omega\in\Omega\), the graph of the \(\omega\)-sample path \(\Gamma_{\mathbf{X}}\left(\omega\right)\equiv\left\{\left(t,x\right)\in\mathbb{T}\times\mathbb{R}^{N}:x=X_{t}\left(\omega\right)\right\}\) of the process \(\mathbf{X}\) is an horizontal straight line with intercept \(X\left(\omega\right)\).

Less trivial examples are the following.

Example 6.3 (Not idependent and identically distributed SSS process) Fixed any time set \(\mathbb{T\subseteq R}\), let \(\left(Y_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{Y}\) be a real stochastic process on a probability space \(\Omega\) of independent standard Rademacher random variables and let \(Z\) be the standard Gaussian random variable on \(\Omega\), which is independent of the random variables in \(Y\). Set \[ X_{t}\overset{\text{def}}{=}Y_{t}Z,\quad\forall t\in\mathbb{T}, \] and consider the process \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{X}\). Then the random variables in \(\mathbf{X}\) are standard Gaussian distributed, uncorrelated but not independent, and the process \(\mathbf{X}\) is SSS.

Proof. By virtue of the independence between \(Z\) and the random variables in \(Y\), we have \[ \mu_{\mathbf{X}}\left(t\right)=\mathbf{E}\left[X_{t}\right]=\mathbf{E}\left[Y_{t}Z\right] =\mathbf{E}\left[Y_{t}\right]\mathbf{E}\left[Z\right]=0 \] and \[ \sigma^{2}_{\mathbf{X}}\left(t\right)=\mathbf{D}^{2}\left[X_{t}\right]=\mathbf{D}^{2}\left[Y_{t}Z\right] =\mathbf{E}\left[Y_{t}^{2}Z^{2}\right]=\mathbf{E}\left[Y_{t}^{2}\right]\mathbf{E}\left[Z^{2}\right]=1, \] for every \(t\in\mathbb{T}\). Furthermore, since also the random variables in \(\mathbf{Y}\) are independent, we have \[ \gamma_{\mathbf{X}}\left(s,t\right)=Cov\left(Y_{s}Z,Y_{t}Z\right) =\mathbf{E}\left[Y_{s}Y_{t}Z^{2}\right]=\mathbf{E}\left[Y_{s}\right] \mathbf{E}\left[Y_{t}\right]\mathbf{E}\left[Z^{2}\right]=0, \] for all \(s,t\in\mathbb{T}\) such that \(s\neq t\). That is the random variables in \(\mathbf{X}\) are uncorrelated. The lack of independence of the random variables in \(\mathbf{X}\) can be argued by considering the random variable \[ X_{t}^{2}=Y_{t}^{2}Z^{2}=Z^{2}, \] for every \(t\in\mathbb{T}\), and observing that \[ Cov\left(X_{s}^{2},X_{t}^{2}\right)=Cov\left(Z^{2},Z^{2}\right)=\mathbf{D}^{2}\left[Z^{2}\right]=3, \] for all \(s,t\in\mathbb{T}\). This, preventing the pairwise independence of \(X_{s}^{2}\) and \(X_{t}^{2}\), prevents the independence of the random variables in \(\mathbf{X}\). In the end, applying the total probability theorem and considering again the independence between \(Z\) and the random variables in \(\mathbf{Y}\), we have \[\begin{align} & \mathbf{P}\left(X_{t_{1}}\leq x_{1},\dots,X_{t_{L}}\leq x_{L}\right) \\ & =\mathbf{P}\left(Y_{t_{1}}Z\leq x_{1},\dots,Y_{t_{L}}Z\leq x_{L}\right)\\ & =\sum\limits_{\left(e_{1},\dots,e_{L}\right)\in\left\{-1,1\right\}^{L}} \mathbf{P}\left(Y_{t_{1}}Z\leq x_{1},\dots,Y_{t_{L}}Z\leq x_{L} \mid Y_{t_{1}}=e_{1},\dots,Y_{t_{L}}=e_{L}\right)\mathbf{P}\left(Y_{t_{1}}=e_{1},\dots,Y_{t_{L}}=e_{L}\right)\\ & =\sum\limits_{\left(e_{1},\dots,e_{L}\right) \in\left\{-1,1\right\}^{L}} \mathbf{P}\left(e_{1}Z\leq x_{1},\dots,e_{L}Z\leq x_{L} \mid Y_{t_{1}}=e_{1},\dots,Y_{t_{L}}=e_{L}\right)\mathbf{P}\left(Y_{t_{1}}=e_{1},\dots,Y_{t_{L}}=e_{L}\right)\\ & =\sum\limits_{\left(e_{1},\dots,e_{L}\right)\in\left\{-1,1\right\}^{L}} \mathbf{P}\left(e_{1}Z\leq x_{1},\dots,e_{L}Z\leq x_{L}\right) \mathbf{P}\left(Y_{t_{1}}=e_{1},\dots,Y_{t_{L}}=e_{L}\right)\\ & =\sum\limits_{\left(e_{1},\dots,e_{L}\right)\in\left\{-1,1\right\}^{L}} \mathbf{P}\left(e_{1}Z\leq x_{1},\dots,e_{L}Z\leq x_{L}\right) \mathbf{P}\left(Y_{t_{1}}=e_{1}\right)\cdots\mathbf{P}\left(Y_{t_{L}}=e_{L}\right)\\ & =\sum\limits_{\left(e_{1},\dots,e_{L}\right)\in\left\{-1,1\right\}^{L}} \mathbf{P}\left(e_{1}Z\leq x_{1},\dots,e_{L}Z\leq x_{L}\right) \mathbf{P}\left(Y_{t_{1}+\tau}=e_{1}\right)\cdots\mathbf{P}\left(Y_{t_{L}+\tau}=e_{L}\right)\\ & =\sum\limits_{\left(e_{1},\dots,e_{L}\right)\in\left\{-1,1\right\}^{L}} \mathbf{P}\left(e_{1}Z\leq x_{1},\dots,e_{L}Z\leq x_{L}\right) \mathbf{P}\left(Y_{t_{1}+\tau}=e_{1},\dots,Y_{t_{L}+\tau}=e_{L}\right)\\ & =\sum\limits_{\left(e_{1},\dots,e_{L}\right)\in\left\{-1,1\right\}^{L}} \mathbf{P}\left(e_{1}Z\leq x_{1},\dots,e_{L}Z\leq x_{L} \mid Y_{t_{1}+\tau}=e_{1},\dots,Y_{t_{L}+\tau}=e_{L}\right) \mathbf{P}\left(Y_{t_{1}+\tau}=e_{1},\dots,Y_{t_{L}+\tau}=e_{L}\right)\\ & =\sum\limits_{\left(e_{1},\dots,e_{L}\right)\in\left\{-1,1\right\}^{L}} \mathbf{P}\left(Y_{t_{1}+\tau}Z\leq x_{1},\dots,Y_{t_{L}+\tau}Z\leq x_{L} \mid Y_{t_{1}+\tau}=e_{1},\dots,Y_{t_{L}+\tau}=e_{L}\right) \mathbf{P}\left(Y_{t_{1}+\tau}=e_{1},\dots,Y_{t_{L}+\tau}=e_{L}\right)\\ & =\mathbf{P}\left(Y_{t_{1}+\tau}Z\leq x_{1},\dots,Y_{t_{L}+\tau}Z\leq x_{L}\right) \end{align}\] for every \(\mathbf{t}\in\mathcal{C}_{fin}\left(\mathbb{T}\right)\), such that \(\mathbf{t}\equiv\left(t_{\ell}\right)_{\ell=1}^{L}\) for some \(L\in\mathbb{N}\), every \(\tau\in\mathbb{R}\), such that \(\mathbf{t}_{\tau}\in\mathcal{C}_{fin}\left(\mathbb{T}\right)\), and every \(\left(x_{1},\dots,x_{L}\right)\in\mathsf{X}_{\ell=1}^{L}\mathbb{R}^{N}\). This proves that \(\mathbf{X}\) is SSS.

Strong-sense stationarity is preserved under rather general transformations.

Let \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{X}\) be a stochastic process on a probability space \(\left(\Omega,\mathcal{E},\mathbf{P}\right)\equiv\Omega\) with state space \(\mathbb{R}^{M}\), for some \(M\in\mathbb{N}\).

Proposition 6.2 (Preservation of the SSS property under process transformation) Assume that the process \(\mathbf{X}\) is SSS. In addition, for a fixed finite sequence \(\left(t_{k}^{\left(0\right)}\right)_{k=1}^{n}\), for some \(n\in\mathbb{N}\), assume that the time set \(\mathbb{T}\) satisfies the folllowing properties

  1. the sequence \(\left(t_{k}^{\left(0\right)}+t\right)_{k=1}^{n}\in\mathcal{C}_{fin}\left(\mathbb{T}\right)\) for every \(t\in\mathbb{T}\)

  2. the sequences \(\left(\!t_{1}^{\left(0\right)}\!\!+\!t_{1},\dots,t_{n}^{\left(0\right)}\!\!+\!t_{1}, \dots,t_{1}^{\left(0\right)}\!\!\!+\!t_{L},\dots,t_{n}^{\left(0\right)}\!\!\!+\!t_{L}\!\right)\) and \(\left(\!t_{1}^{\left(0\right)}\!\!\!+\!t_{1}\!+\!\tau,\dots,t_{n}^{\left(0\right)}\!\!\!+\!t_{1}\!+\!\tau, \dots,t_{1}^{\left(0\right)}\!\!\!+\!t_{L}\!+\!\tau,\dots,t_{n}^{\left(0\right)}\!\!\!+\!t_{L}\!+\!\tau\!\right)\) are in \(\mathcal{C}_{fin}\left(\mathbb{T}\right)\) for every \(\mathbf{t}\in\mathcal{C}_{fin}\left(\mathbb{T}\right)\), such that \(\mathbf{t}\equiv\left(t_{\ell}\right)_{\ell=1}^{L}\) for some \(L\in\mathbb{N}\), and every \(\tau\in\mathbb{R}\), such that \(\mathbf{t}_{\tau}\in\mathcal{C}_{fin}\left(\mathbb{T}\right)\).

Then, given any Borel map \(g:\mathsf{X}_{k=1}^{n}\mathbb{R}^{M}\rightarrow\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\), the process \(\left(Y_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{Y}\) on the probability space \(\Omega\) with state space \(\mathbb{R}^{N}\) given by \[\begin{equation} Y_{t}\overset{\text{def}}{=} g\left(X_{t_{1}^{\left(0\right)}+t},\dots,X_{t_{n}^{\left(0\right)}+t}\right), \quad\forall t\in\mathbb{T}, \tag{6.3} \end{equation}\] is SSS. Note that Properties 1. and 2. of the time set \(\mathbb{T}\) have the mere purpose to allow the transformation of the process \(\mathbf{X}\) by the map \(g\) and the check of the SSS property of the transformed process. For instance, if \(\mathbb{T}=\mathbb{N}\) or \(\mathbb{T}=\mathbb{Z}\) or \(\mathbb{T}=\mathbb{R}_{+}\) they are obviously satisfied.

Proof. Given any \(\mathbf{t}\in\mathcal{C}_{fin}\left(\mathbb{T}\right)\), such that \(\mathbf{t}\equiv\left(t_{\ell}\right)_{\ell=1}^{L}\) for some \(L\in\mathbb{N}\), any \(\tau\in\mathbb{R}\), such that \(\mathbf{t}_{\tau}\equiv\left(t_{\ell}+\tau\right)_{\ell=1}^{L}\in\mathcal{C}_{fin}\left(\mathbb{T}\right)\), and any \(\left(y_{1},\dots,y_{L}\right)\in\mathsf{X}_{\ell=1}^{L}\mathbb{R}^{N}\), we can write \[\begin{align} & \mathbf{P}\left(Y_{t_{1}}\leq y_{1},\dots,Y_{t_{L}}\leq y_{L}\right)\\ & =\mathbf{P}\left(g\left(X_{t_{1}^{\left(0\right)}+t_{1}},\dots, X_{t_{n}^{\left(0\right)}+t_{1}}\right)\leq y_{1},\dots, g\left(X_{t_{1}^{\left(0\right)}+t_{L}},\dots, X_{t_{n}^{\left(0\right)}+t_{L}}\right)\leq y_{L}\right)\\ & =\mathbf{P}\left(\left(X_{t_{1}^{\left(0\right)}+t_{1}},\dots, X_{t_{n}^{\left(0\right)}+t_{1}}\right)\in g^{-1}\left(\left(-\infty,y_{1}\right]\right),\dots, \left(X_{t_{1}^{\left(0\right)}+t_{L}}, \dots,X_{t_{n}^{\left(0\right)}+t_{L}}\right)\in g^{-1}\left(\left(-\infty,y_{L}\right]\right)\right)\\ & =\mathbf{P}\left(\left(X_{t_{1}^{\left(0\right)}+t_{1}},\dots, X_{t_{n}^{\left(0\right)}+t_{1}},\dots,X_{t_{1}^{\left(0\right)}+t_{L}}, \dots,X_{t_{n}^{\left(0\right)}+t_{L}}\right)\in \mathsf{X}_{\ell=1}^{L}g^{-1}\left(\left(-\infty,y_{\ell}\right]\right)\right)\\ & =\mathbf{P}\left(\left(X_{t_{1}^{\left(0\right)}+t_{1}+\tau},\dots, X_{t_{n}^{\left(0\right)}+t_{1}+\tau},\dots,X_{t_{1}^{\left(0\right)}+t_{L}+\tau}, \dots,X_{t_{n}^{\left(0\right)}+t_{L}+\tau}\right)\in \mathsf{X}_{\ell=1}^{L}g^{-1}\left(\left(-\infty,y_{\ell}\right]\right)\right)\\ & =\mathbf{P}\left(\left(X_{t_{1}^{\left(0\right)}+t_{1}+\tau},\dots, X_{t_{n}^{\left(0\right)}+t_{1}+\tau}\right)\in g^{-1}\left(\left(-\infty,y_{1}\right]\right),\dots, \left(X_{t_{1}^{\left(0\right)}+t_{L}+\tau}, \dots,X_{t_{n}^{\left(0\right)}+t_{L}+\tau}\right)\in g^{-1}\left(\left(-\infty,y_{L}\right]\right)\right)\\ & =\mathbf{P}\left(g\left(X_{t_{1}^{\left(0\right)}+t_{1}+\tau},\dots, X_{t_{n}^{\left(0\right)}+t_{1}+\tau}\right)\leq y_{1},\dots, g\left(X_{t_{1}^{\left(0\right)}+t_{L}+\tau},\dots, X_{t_{n}^{\left(0\right)}+t_{L}+\tau}\right)\leq y_{L}\right)\\ & =\mathbf{P}\left(Y_{t_{1}+\tau}\leq y_{1},\dots,Y_{t_{L}+\tau}\leq y_{L}\right). \end{align}\] This proves that \(\mathbf{Y}\) is SSS.

Example 6.4 (Positive power of a SSS real process) Let \(\mathbf{X}\) be a SSS real stochastic process on a probability space \(\Omega\). Then, for any fixed \(K\in\mathbb{N}\), the process \(\left(Y_{t}\right)_{t\in\mathbb{T}}\) on \(\Omega\) given by \[ Y_{t}\overset{\text{def}}{=}X_{t}^{K},\quad\forall t\in\mathbb{T}, \] is SSS.

Example 6.5 (Power of a SSS strictly positive process) Let \(\mathbf{X}\) be a SSS strictly positive stochastic process on a probability space \(\Omega\). Then, for any fixed \(p\in\mathbb{R}\), the process \(\left(Y_{t}\right)_{t\in\mathbb{T}}\) on \(\Omega\) given by \[ Y_{t}\overset{\text{def}}{=}X_{t}^{p},\quad\forall t\in\mathbb{T}, \] is SSS.

Example 6.6 (Exponential of a SSS real process) Let \(\mathbf{X}\) be a SSS real stochastic process on a probability space \(\Omega\). Then the process \(\left(Y_{t}\right)_{t\in\mathbb{T}}\) on \(\Omega\) given by \[ Y_{t}\overset{\text{def}}{=}\exp\left(X_{t}\right),\quad\forall t\in\mathbb{T}, \] is SSS.

Example 6.7 (Logarithm of a SSS real process) Let \(\mathbf{X}\) be a SSS strictly positive stochastic process on a probability space \(\Omega\). Then, the process \(\left(Y_{t}\right)_{t\in\mathbb{T}}\) on \(\Omega\) given by \[ Y_{t}\overset{\text{def}}{=}\log\left(X_{t}\right),\quad\forall t\in\mathbb{T}, \] is SSS.

Example 6.8 (Multiple product of a SSS real process) Let \(\mathbf{X}\) be a SSS real stochastic process on a probability space \(\Omega\). For simplicity, assume that \(\mathbb{T\equiv N}\). Then, for any fixed \(n\in\mathbb{N}\), the process \(\left(Y_{t}\right)_{t\in\mathbb{N}}\) given by \[ Y_{t}\overset{\text{def}}{=}X_{t}\cdots X_{t+n},\quad\forall t\in\mathbb{N}, \] is SSS.

7 Weak-Sense Stationary (WSS) Processes

Let \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{X}\) be a stochastic process on a probability space \(\left(\Omega,\mathcal{E},\mathbf{P}\right)\equiv\Omega\) with state space \(\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\). Assume that \(\mathbf{X}\) has order \(2\).

Definition 7.1 (Weak-sense stationary (WSS) process) We say that \(\mathbf{X}\) is weak-sense stationary (WSS) or weakly stationary or covariance stationary if we have:

  1. \(\mu_{\mathbf{X}}\left(s\right)=\mu_{\mathbf{X}}\left(t\right), \quad\forall s,t\in\mathbb{T}\);

  2. \(\Gamma_{\mathbf{X}}\left(s,t\right)=\Gamma_{\mathbf{X}}\left(s+\tau,t+\tau\right), \quad\forall s,t\in\mathbb{T}\text{ and }\forall\tau\in\mathbb{R}\text{ s.t. }s+\tau\text{ and } t+\tau\in\mathbb{T}\);

where \(\mu_{\mathbf{X}}:\mathbb{N}_{0}\rightarrow\mathbb{R}\) [resp. \(\Gamma_{\mathbf{X}}:\mathbb{T}\times\mathbb{T}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\)] is the mean [resp. autocovariance] function and of the process \(\mathbf{X}\) (see Definition 5.2 and 5.5).

If the process \(\mathbf{X}\) is weak-sense stationary, given any \(t_{0}\in\mathbb{T}\), we have \[\begin{equation} \mu_{\mathbf{X}}\left(t\right)=\mu_{X}\left(t_{0}\right), \quad \Sigma_{\mathbf{X}}^{2}\left(t\right)=\Sigma_{\mathbf{X}}^{2}\left(t_{0}\right), \quad \mathrm{P}_{\mathbf{X}}\left(t\right)=\mathrm{P}_{\mathbf{X}}\left(t_{0}\right), \end{equation}\] for every \(t\in\mathbb{T}\), where \(\Sigma_{\mathbf{X}}^{2}:\mathbb{T}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) [resp. \(\mathrm{P}_{\mathbf{X}}:\mathbb{T}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\)] is the variance-covariance [resp. correlation] function of the process \(\mathbf{X}\) (see Definition 5.3 and 5.4).

If the process \(\mathbf{X}\) is weak-sense stationary, we write \(\mu_{\mathbf{X}}\) [resp. \(\Sigma_{\mathbf{X}}^{2}\), \(\mathrm{P}_{\mathbf{X}}\)] for the costant value of the mean [resp. variance-covariance, correlation] function of \(\mathbf{X}\) and we refer to \(\mu_{\mathbf{X}}\) [resp. \(\Sigma_{\mathbf{X}}^{2}\), \(\mathrm{P}_{\mathbf{X}}\)] as the mean [resp. variance-covariance, correlation] of \(\mathbf{X}\).

Proposition 7.1 (SSS process as WSS process) If \(\mathbf{X}\) is a strong-sense stationary process of order \(2\), then \(\mathbf{X}\) is weak-sense stationary.

Proposition 7.2 (Uncorrelated and identically distributed process as WSS process) If \(\mathbf{X}\) is a process of order \(2\) and the random variables in \(\mathbf{X}\) are uncorrelated and identically distributed. Then \(\mathbf{X}\) is a weak-sense stationary process.

Proposition 7.3 (Autocovariance and autocorrelation of a WSS process) If the process \(\mathbf{X}\) is weak-sense stationary, given any \(t_{0}\in\mathbb{T}\), we have \[\begin{equation} \Gamma_{\mathbf{X}}\left(s,t\right)=\Gamma_{\mathbf{X}}\left(t_{0},t_{0}+\left(t-s\right)\right). \end{equation}\] Under the further assumption \(\det\left(\operatorname*{diag}\Sigma_{\mathbf{X}}^{2}\right)\neq 0\), we have also \[\begin{equation} \mathrm{P}_{\mathbf{X}}\left(s,t\right)=\mathrm{P}_{\mathbf{X}}\left(t_{0},t_{0}+\left(t-s\right)\right), \end{equation}\] for all \(s,t\in\mathbb{T}\) such that \(t_{0}+\left(t-s\right)\in\mathbb{T}\), where \(\mathrm{P}_{\mathbf{X}}:\mathbb{T}\times\mathbb{T}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) is the autocorrelation function of the process \(\mathbf{X}\) (see Definition 5.7).

Proof. Under the assumption of weakly covariance, for all \(s,t\in\mathbb{T}\) such that \(t_{0}+\left(t-s\right)\in\mathbb{T}\), setting \(\tau\equiv t_{0}-s\), we can write \[\begin{equation} \Gamma_{\mathbf{X}}\left(s,t\right)=\Gamma_{\mathbf{X}}\left(s+\tau,t+\tau\right) =\Gamma_{\mathbf{X}}\left(s+\left(t_{0}-s\right),t+\left(t_{0}-s\right)\right) =\Gamma_{X}\left(t_{0},t_{0}+\left(t-s\right)\right) \end{equation}\] In addition, we have \[\begin{equation} \Sigma_{\mathbf{X}}^{2}\left(t\right)=\Sigma_{\mathbf{X}}^{2}\left(s\right) =\Sigma_{X}\left(t_{0}+\left(t-s\right)\right)=\Sigma_{X}\left(t_{0}\right) =\Sigma_{\mathbf{X}}^{2}, \end{equation}\] for all \(s,t\in\mathbb{T}\) such that \(t_{0}+\left(t-s\right)\in\mathbb{T}\). In light of what shown above, in case \(\det\left(\operatorname*{diag}\Sigma_{\mathbf{X}}^{2}\right)\neq 0\), we can also write \[\begin{align} \mathrm{P}_{\mathbf{X}}\left(s,t\right) &=\left(\operatorname*{diag}\Sigma_{\mathbf{X}}^{2}\left(s\right)\right)^{-\frac{1}{2}} \Gamma_{\mathbf{X}}\left(s,t\right) \left(\operatorname*{diag}\Sigma_{\mathbf{X}}^{2}\left(t\right)\right)^{-\frac{1}{2}}\\ &=\left(\operatorname*{diag}\Sigma_{\mathbf{X}}^{2}\left(t_{0}\right)\right)^{-\frac{1}{2}} \Gamma_{\mathbf{X}}\left(t_{0},t_{0}+\left(t-s\right)\right) \left(\operatorname*{diag}\Sigma_{\mathbf{X}}^{2}\left(t_{0}+\left(t-s\right)\right)\right)^{-\frac{1}{2}}\\ &=\mathrm{P}_{X}\left(t_{0},t_{0}+\left(t-s\right)\right). \end{align}\] This completes the proof.

Definition 7.2 (Reduced autocovariance and autocorrelation of a WSS process) If the process \(\mathbf{X}\) is weak-sense stationary, choosing any \(t_{0}\in\mathbb{T}\) and setting \[\begin{equation} \mathbb{T}_{0}\equiv\left\{\tau\in\mathbb{R}:t_{0}+\tau\in\mathbb{T}\right\}, \end{equation}\] we call the map \(\Gamma_{\mathbf{X},t_{0}}:\mathbb{T}_{0}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) given by \[\begin{equation} \Gamma_{\mathbf{X},t_{0}}\left(\tau\right) \overset{\text{def}}{=}\Gamma_{X}\left(t_{0},t_{0}+\tau\right), \quad\forall\tau\in\mathbb{T}_{0}, \tag{7.1} \end{equation}\] the reduced autocovariance function of \(\mathbf{X}\) referred to \(t_{0}\) and, for any \(\tau\in\mathbb{T}_{0}\), we call the matrix \(\Gamma_{\mathbf{X},t_{0}}\left(\tau\right)\) the \(\tau\)-shifted (lagged) autocovariance of \(\mathbf{X}\) referred to \(t_{0}\). In case \(\det\left(\operatorname*{diag}\Sigma_{\mathbf{X}}^{2}\right)\neq 0\), we call the map \(\mathrm{P}_{\mathbf{X},t_{0}}:\mathbb{T}_{0}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) given by \[\begin{equation} \mathrm{P}_{\mathbf{X},t_{0}}\left(\tau\right) \overset{\text{def}}{=}\mathrm{P}_{\mathbf{X}}\left(t_{0},t_{0}+\tau\right), \quad\forall\tau\in\mathbb{T}_{0}, \tag{7.2} \end{equation}\] the reduced autocorrelation function of \(\mathbf{X}\) referred to \(t_{0}\) and, for any \(\tau\in\mathbb{T}_{0}\), we call the matrix \(\mathrm{P}_{\mathbf{X},t_{0}}\left(\tau\right)\) the \(\tau\)-shifted (lagged) autocorrelation of \(\mathbf{X}\) referred to \(t_{0}\).

Proposition 7.4 (Reduced autocovariance and autocorrelation of a WSS Process) Assume that the process \(\mathbf{X}\) is weak-sense stationary and, for some \(t_{0}\in\mathbb{T}\), let \(\Gamma_{\mathbf{X},t_{0}}:\mathbb{T}_{0}\rightarrow\mathbb{R}\) [resp. \(\mathbf{P}_{X,t_{0}}:\mathbb{T}_{0}\rightarrow\mathbb{R}\)] be the reduced autocovariance [resp. autocorrelation] function of \(\mathbf{X}\) referred to \(t_{0}\). Then we have \[\begin{equation} \mathrm{P}_{\mathbf{X},t_{0}}\left(\tau\right) =\left(\operatorname*{diag}\Sigma_{\mathbf{X}}^{2}\right)^{-\frac{1}{2}} \Gamma_{\mathbf{X},t_{0}}\left(\tau\right) \left(\operatorname*{diag}\Sigma_{\mathbf{X}}^{2}\right)^{-\frac{1}{2}} \tag{7.3} \end{equation}\] and \[\begin{equation} \Gamma_{\mathbf{X},t_{0}}\left(\tau\right) =\left(\operatorname*{diag}\Sigma_{\mathbf{X}}^{2}\right)^{\frac{1}{2}} \mathrm{P}_{\mathbf{X},t_{0}}\left(\tau\right) \left(\operatorname*{diag}\Sigma_{\mathbf{X}}^{2}\right)^{\frac{1}{2}} \tag{7.4} \end{equation}\] for every \(\tau\in\mathbb{T}_{0}\).

Proof. Under the assumption of weakly covariance, we can write \[\begin{align} \mathrm{P}_{\mathbf{X},t_{0}}\left(\tau\right) & =\mathrm{P}_{\mathbf{X}}\left(t_{0},t_{0}+\tau\right)\\ & =\left(\operatorname*{diag}\Sigma_{\mathbf{X}}^{2}\left(t_{0}\right)\right)^{-\frac{1}{2}} \Gamma_{\mathbf{X}}\left(t_{0},t_{0}+\tau\right) \left(\operatorname*{diag}\Sigma_{\mathbf{X}}^{2}\left(t_{0}+\tau\right)\right)^{-\frac{1}{2}}\\ & =\left(\operatorname*{diag}\Sigma_{\mathbf{X}}^{2}\right)^{-\frac{1}{2}} \Gamma_{\mathbf{X},t_{0}}\left(\tau\right) \left(\operatorname*{diag}\Sigma_{\mathbf{X}}^{2}\right)^{-\frac{1}{2}}. \end{align}\] This proves (7.3). Equation (7.4) immediately follows.

Note that for any \(t_{0}\in\mathbb{T}\) we have \(0\in\mathbb{T}_{0}\). Furthermore, \[\begin{equation} \Gamma_{\mathbf{X},t_{0}}\left(0\right)=\Sigma_{\mathbf{X}}^{2} \quad\text{and}\quad\mathrm{P}_{\mathbf{X},t_{0}}\left(0\right)=\mathrm{P}_{\mathbf{X}}. \end{equation}\]

Proposition 7.5 (Reduced autocovariance and autocorrelation of real WSS process) In case \(N=1\), assume that \(\mathbf{X}\) is weak-sense stationary and let \(\gamma_{\mathbf{X},t_{0}}:\mathbb{T}_{0}\rightarrow\mathbb{R}\) [resp. \(\rho_{\mathbf{X},t_{0}}:\mathbb{T}_{0}\rightarrow\mathbb{R}\)] be the reduced autocovariace [resp. autocorrelation] function of \(\mathbf{X}\), referred to some \(t_{0}\in\mathbb{T}\). Then we have \[\begin{equation} \rho_{\mathbf{X},t_{0}}\left(\tau\right) =\frac{\gamma_{\mathbf{X},t_{0}}\left(\tau\right)}{\gamma_{\mathbf{X},t_{0}}\left(0\right)} =\frac{\gamma_{\mathbf{X},t_{0}}\left(\tau\right)}{\sigma_{\mathbf{X}}^{2}}, \tag{7.5} \end{equation}\] for every \(\tau\in\mathbb{T}_{0}\). In particular, \[\begin{equation} \rho_{\mathbf{X},t_{0}}\left(0\right)=1. \tag{7.6} \end{equation}\] Furthermore, we have
\[\begin{equation} \gamma_{\mathbf{X},t_{0}}\left(-\tau\right)=\gamma_{\mathbf{X},t_{0}}\left(\tau\right) \quad\text{[resp. } \rho_{\mathbf{X},t_{0}}\left(-\tau\right)=\rho_{\mathbf{X},t_{0}}\left(\tau\right)\text{]}, \tag{7.7} \end{equation}\] for every \(\tau\in\mathbb{T}_{0}\) such that \(-\tau\in\mathbb{T}_{0}\), and \[\begin{equation} \left\vert\gamma_{\mathbf{X},t_{0}}\left(\tau\right)\right\vert\leq\sigma_{\mathbf{X}}^{2} \quad\text{[resp. } \left\vert\rho_{\mathbf{X},t_{0}}\left(\tau\right)\right\vert\leq1\text{]}, \tag{7.8} \end{equation}\] for every \(\tau\in\mathbb{T}_{0}\).

Proposition 7.6 (Partial autocovariance and partial autocorrelation of a WSS process) If the process \(\mathbf{X}\) is weak-sense stationary, given any \(t_{0}\in\mathbb{T}\), we have \[\begin{equation} \Phi_{\mathbf{X}}\left(s,t\right)=\Phi_{\mathbf{X}}\left(t_{0},t_{0}+\left(t-s\right)\right), \end{equation}\] where \(\Phi_{\mathbf{X}}:\mathbb{T}\times\mathbb{T}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) is the partial autocorrelation function of the process \(\mathbf{X}\) (see Definition 5.8) Under the further assumption \(\det\left(\operatorname*{diag}\Sigma_{\mathbf{X}}^{2}\right)\neq 0\), we have also \[\begin{equation} \Psi_{\mathbf{X}}\left(s,t\right)=\Psi_{\mathbf{X}}\left(t_{0},t_{0}+\left(t-s\right)\right), \end{equation}\] for all \(s,t\in\mathbb{T}\) such that \(t_{0}+\left(t-s\right)\in\mathbb{T}\), where \(\Psi_{\mathbf{X}}:\mathbb{T}\times\mathbb{T}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) is the autocorrelation function of the process \(\mathbf{X}\) (see Definition 5.9).

Definition 7.3 (Reduced autocovariance and autocorrelation of a WSS process) If the process \(\mathbf{X}\) is weak-sense stationary, choosing any \(t_{0}\in\mathbb{T}\) and setting \[\begin{equation} \mathbb{T}_{0}\equiv\left\{\tau\in\mathbb{R}:t_{0}+\tau\in\mathbb{T}\right\}, \end{equation}\] we call the map \(\Phi_{\mathbf{X},t_{0}}:\mathbb{T}_{0}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) given by \[\begin{equation} \Phi_{\mathbf{X},t_{0}}\left(\tau\right) \overset{\text{def}}{=}\Phi_{X}\left(t_{0},t_{0}+\tau\right), \quad\forall\tau\in\mathbb{T}_{0}, \tag{7.9} \end{equation}\] the reduced partial autocovariance function of \(\mathbf{X}\) referred to \(t_{0}\) and, for any \(\tau\in\mathbb{T}_{0}\), we call the matrix \(\Phi_{\mathbf{X},t_{0}}\left(\tau\right)\) the \(\tau\)-shifted (lagged) partial autocovariance of \(\mathbf{X}\) referred to \(t_{0}\). In case \(\det\left(\operatorname*{diag}\Sigma_{\mathbf{X}}^{2}\right)\neq 0\), we call the map \(\Psi_{\mathbf{X},t_{0}}:\mathbb{T}_{0}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) given by \[\begin{equation} \Psi_{\mathbf{X},t_{0}}\left(\tau\right) \overset{\text{def}}{=}\Psi_{\mathbf{X}}\left(t_{0},t_{0}+\tau\right), \quad\forall\tau\in\mathbb{T}_{0}, \tag{7.10} \end{equation}\] the reduced partial autocorrelation function of \(\mathbf{X}\) referred to \(t_{0}\) and, for any \(\tau\in\mathbb{T}_{0}\), we call the matrix \(\Psi_{\mathbf{X},t_{0}}\left(\tau\right)\) the \(\tau\)-shifted (lagged) partial autocorrelation of \(\mathbf{X}\) referred to \(t_{0}\).

Theorem 7.1 (Partial autocorrelation function in case N=1 - Durbin-Levison recursion theorem) In case \(N=1\), assume that \(\mathbf{X}\) is weak-sense stationary and, for simplicity, assume that \(\mathbb{T}\subseteq\mathbb{Z}\). Fix any \(t_0\in\mathbb{T}\) and \(\tau\in\mathbb{N}\) such that \(\tau>2\). Hence, consider the vector \[\begin{equation} \rho_{\mathbf{X},t_{0}}^{\left(\tau\right)} \equiv\left(\rho_{\mathbf{X},t_{0}}\left(1\right),\dots,\rho_{\mathbf{X},t_{0}}\left(\tau\right)\right)^{\intercal} \end{equation}\] and the matrix \[\begin{equation} \Delta_{\mathbf{X},t_{0}}\left(\tau\right) \equiv \begin{pmatrix} 1 & \rho_{\mathbf{X},t_{0}}\left(1\right) & \cdots & \rho_{\mathbf{X},t_{0}}\left(\tau-2\right) & \rho_{\mathbf{X},t_{0}}\left(\tau-1\right)\\ \rho_{\mathbf{X},t_{0}}\left(1\right) & 1 & \cdots & \rho_{\mathbf{X},t_{0}}\left(\tau-3\right) & \rho_{\mathbf{X},t_{0}}\left(\tau-2\right)\\ \vdots & \vdots & \ddots & \vdots & \vdots\\ \rho_{\mathbf{X},t_{0}}\left(\tau-2\right) & \rho_{\mathbf{X},t_{0}}\left(\tau-3\right) & \cdots & 1 & \rho_{\mathbf{X},t_{0}}\left(1\right) \\ \rho_{\mathbf{X},t_{0}}\left(\tau-1\right) & \rho_{\mathbf{X}}\left(\tau-2\right) & \cdots & \rho_{\mathbf{X},t_{0}}\left(1\right) & 1 \end{pmatrix} \end{equation}\] Then the solution \[\begin{equation} \psi^{\left(\tau\right)}\equiv\left(\psi_{\tau,1},\dots,\psi_{\tau,\tau}\right)^{\intercal} \end{equation}\] of the equation \[\begin{equation} \Delta_{\mathbf{X},t_{0}}\left(\tau\right)\psi^{\left(\tau\right)} =\rho_{\mathbf{X},t_{0}}^{\left(\tau\right)} \end{equation}\] satisfies \[\begin{equation} \psi_{\tau,\tau}=\psi_{\mathbf{X}}\left(\tau\right). \end{equation}\] In particular, if the matrix \(\Delta_{\mathbf{X},t_{0}}\left(\tau\right)\) is non-singular, we have \[\begin{equation} \psi_{\mathbf{X}}\left(\tau\right) =\frac{\det\left(\Delta_{\mathbf{X},t_{0}}^{\ast}\left(\tau\right)\right)} {\det\left(\Delta_{\mathbf{X},t_{0}}\left(\tau\right)\right)}, \end{equation}\] where \(\Delta_{\mathbf{X},t_{0}}^{\ast}\left(\tau\right)\) is a matrix with the same columns of \(\Delta_{\mathbf{X},t_{0}}\left(\tau\right)\), but the \(\tau\)th column which is replaced by the vector \(\rho_{\mathbf{X},t_{0}}^{\left(\tau\right)}\).

Using the abbreviation \[\begin{equation} \rho_{\mathbf{X},t_{0}}\left(\tau\right)\equiv\rho_{\tau}. \end{equation}\] we have \[\begin{equation} \psi_{\mathbf{X}}\left(1\right)=\rho_{1}, \quad\psi_{\mathbf{X}}\left(2\right) =\tfrac{\left\vert \begin{array} [c]{cc} 1 & \rho_{1}\\ \rho_{1} & \rho_{2} \end{array} \right\vert}{\left\vert \begin{array} [c]{cc} 1 & \rho{P}_{1}\\ \rho{P}_{1} & 1 \end{array} \right\vert }=\tfrac{\rho_{2}-\rho_{1}^{2}}{1-\rho_{1}^{2}}, \quad\psi_{\mathbf{X}}\left(3\right)=\tfrac{\left\vert \begin{array} [c]{ccc} 1 & \rho_{1} & \rho_{1}\\ \rho_{1} & 1 & \rho_{2}\\ \rho_{2} & \rho_{1} & \rho_{3} \end{array} \right\vert }{\left\vert \begin{array} [c]{ccc} 1 & \rho_{1} & \rho_{2}\\ \rho_{1} & 1 & \rho_{1}\\ \rho_{2} & \rho_{1} & 1 \end{array} \right\vert}=\tfrac{\rho_{3}\left(1-\rho_{1}^{2}\right)+\rho_{1}\left( \rho_{1}^{2}+\rho_{2}^{2}-2\rho_{2}\right)}{\left(1-\rho_{2}\right) \left(1+\rho_{2}-2\rho_{1}^{2}\right)} \end{equation}\]

8 Ergodic Processes

Let \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{X}\) be a stochastic process of order \(2\) on a probability space \(\left(\Omega,\mathcal{E},\mathbf{P}\right)\equiv\Omega\) with state space \(\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\). For a given \(t_{0}\in\mathbb{T}\), consider the \(N\)-variate real random variable \(X_{t_{0}}\) and let \(X_{t_{0}}^{\left(1\right)},\dots,X_{t_{0}}^{\left(n\right)}\) be a simple random sample of size \(n\in\mathbb{N}\) drawn from \(X_{t_{0}}\).From the basic Statistics, we know that the sample mean of size \(n\in\mathbb{N}\) drawn from \(X_{t_{0}}\), which is given by \[\begin{equation} \bar{X}_{t_{0},n}\overset{\text{def}}{=} \frac{1}{n}\sum\limits_{k=1}^{n}X_{t_{0}}^{\left(k\right)}, \end{equation}\] is an unbiased estimator of \(\mathbf{E}\left[X_{t_{0}}\right]\equiv\mu_{X_{t_{0}}}\), that is \[\begin{equation} \mathbf{E}\left[\bar{X}_{t_{0},n}\right]=\mu_{X_{t_{0}}}, \end{equation}\] and the mean square error is given by \[\begin{equation} MSE\left(\bar{X}_{t_{0},n}\right)=\operatorname*{trace}\left(Var\left(X_{t_{0}}\right)\right), \end{equation}\] where \(\operatorname*{trace}:\mathbb{R}^{N}\times\mathbb{R}^{N}\rightarrow\mathbb{R}\) is the trace operator. Moreover, by virtue of the Law of Large Numbers for random variables with finite moment of order \(2\), we know that \(\bar{X}_{t_{0},n}\) is mean square error consistent that is \[\begin{equation} \bar{X}_{t_{0},n}\overset{\mathbf{L}^{2}}{\rightarrow}\mu_{X_{t_{0}}}. \end{equation}\] A fortiori, \[\begin{equation} \bar{X}_{t_{0},n}\overset{\mathbf{P}}{\rightarrow}\mu_{X_{t_{0}}}. \end{equation}\] Therefore, the sample mean \(\bar{X}_{t_{0},n}\) is a “good” estimator of \(\mu_{X_{t_{0}}}\). Note that the estimate of \(\mu_{X_{t_{0}}}\) by means of the estimator \(\bar{X}_{t_{0},n}\), on the occurrence of an outcome \(\omega\in\Omega\), can be written as \[\begin{equation} \bar{X}_{t_{0},n}\left(\omega\right)\equiv \frac{1}{n}\sum\limits_{k=1}^{n}X_{t_{0}}^{\left(k\right)}\left(\omega\right)\equiv \frac{1}{n}\sum\limits_{k=1}^{n}x_{t_{0}}^{\left(k\right)}, \end{equation}\] where \(X_{t_{0}}^{\left(1\right)}\left(\omega\right)\equiv x_{t_{0}}^{\left(1\right)},\dots,X_{t_{0}}^{\left(n\right)}\left(\omega\right)\equiv x_{t_{0}}^{\left(n\right)}\) are the realizations of the independent and identically distributed random variables \(X_{t_{0}}^{\left(1\right)},\dots,X_{t_{0}}^{\left(n\right)}\). Now, if we assume that the process \(\mathbf{X}\) is at least weak-sense stationary, we could use \(\bar{X}_{t_{0},n}\) as a “good” estimator of the mean \(\mu_{\mathbf{X}}\left(t\right)=\mu_{\mathbf{X}}\left(t_{0}\right)\equiv\mu_{X}\) of the process. In this context, the estimator \(\bar{X}_{t_{0},n}\) is referred to as the ensemble average estimator of size \(n\) of \(\mu_{X}\). However, unless the random variables in the process \(\mathbf{X}\) are independent and identically distributed, we cannot observe the realization of \(n\) independent copies of \(X_{t_{0}}\), but only the realizations \(X_{t_{1}}\left(\omega\right),\dots,X_{t_{n}}\left(\omega\right)\) of the random variables in \(\mathbf{X}\) at different times, \(t_{1},\dots,t_{n}\in\mathbb{T}\), and we know that even in the case of strong stationary processess these variables do not need to be independent, neither uncorrelated. Hence, the question arises to what extent we can use the information provided by \(X_{t_{1}}\left(\omega\right),\dots,X_{t_{n}}\left(\omega\right)\) to estimate some traits of the process \(\mathbf{X}\). This leads to introduce the idea of ergodicity.

Assume that \(\mathbf{X}\) is weak-sense stationary. Furthermore, for simplicity, assume that \(\mathbb{T}\equiv\mathbb{N}\) or \(\mathbb{T}\equiv\mathbb{Z}\). Write \(\mu_{\mathbf{X}}\) [resp. \(\Sigma_{\mathbf{X}}^{2}\)] for the constant value of the mean [resp. variance-covariance] function of \(\mathbf{X}\).

Definition 8.1 (Time average estimator) Fixed any \(T\in\mathbb{N}\), we call the time average estimator of size \(T\) of \(\mathbf{X}\) the statistic \[\begin{equation} \bar{X}_{T}\overset{\text{def}}{=}\left\{ \begin{array} [c]{ll} \frac{1}{T}\sum\limits_{t=1}^{T}X_{t}, & \text{if }\mathbb{T}\equiv\mathbb{N},\\ \frac{1}{2T+1}\sum\limits_{t=-T}^{T}X_{t}, & \text{if }\mathbb{T}\equiv\mathbb{Z}. \end{array} \right. \tag{8.1} \end{equation}\]

Definition 8.2 (Time variance-covariance estimator) Fixed any \(T\in\mathbb{N}\), we call the time variance-covariance estimator of size \(T\) of \(\mathbf{X}\) the statistic \[\begin{equation} S_{\mathbf{X},T}^{2}\overset{\text{def}}{=}\left\{ \begin{array} [c]{ll} \frac{1}{T}\sum\limits_{t=1}^{T}\left(X_{t}-\bar{X}_{T}\right)^{2}, & \text{if }\mathbb{T}\equiv\mathbb{N},\\ \frac{1}{2T+1}\sum\limits_{t=-T}^{T}\left(X_{t}-\bar{X}_{T}\right)^{2}, & \text{if }\mathbb{T}\equiv\mathbb{Z}. \end{array} \right. \tag{8.2} \end{equation}\]

Definition 8.3 (Autocovariance estimator) Fixed any \(T\in\mathbb{N}\), we call autocovariance estimator of size \(T\) of \(\mathbf{X}\) at the shift (lag) \(\tau\) the statistic \[\begin{equation} C_{\mathbf{X},T}\left(\tau\right)\overset{\text{def}}{=}\left\{ \begin{array} [c]{ll} \frac{1}{T}\sum\limits_{t=1}^{T-\tau}\left(X_{t}-\bar{X}_{T}\right) \left(X_{t+\tau}-\bar{X}_{T}\right)^{\intercal}, & \text{if }\mathbb{T}\equiv\mathbb{N},\\ \frac{1}{2T+1}\sum\limits_{t=-T}^{T-\tau}\left(X_{t}-\bar{X}_{T}\right) \left(X_{t+\tau}-\bar{X}_{T}\right)^{\intercal}, & \text{if }\mathbb{T}\equiv\mathbb{Z}, \end{array} \right. \quad\forall\tau=0,1,\dots,T-1. \tag{8.3} \end{equation}\]

Note that in Equation (8.3) the factor \(1/\left(T-\tau\right)\) [resp. \(1/\left(2T+1-\tau\right)\)] is sometimes used in place of \(1/T\) [resp. \(1/\left(2T+1\right)\)].

Note that \[\begin{equation} C_{\mathbf{X},T}\left(0\right)=S_{\mathbf{X},T}^{2}. \end{equation}\]

Definition 8.4 (Time autocorrelation estimator) Fixed any \(T\in\mathbb{N}\), assume that \(det_{N}\left(\operatorname*{diag}\left(C_{\mathbf{X},T}\left(0\right)\right)\right)\neq 0\), where \(\det_{N}:\mathbb{R}^{N}\times\mathbb{R}^{N}\rightarrow\mathbb{R}\) is the determinant functional and \(\operatorname*{diag}\left(C_{\mathbf{X},T}\left(0\right)\right)\) is the diagonal matrix having for diagonal entries the corresponding diagonal entries of \(C_{\mathbf{X},T}\left(0\right)\). We call time autocorrelation estimator of size \(T\) of \(\mathbf{X}\) at the shift (lag) \(\tau\) the statistic \[\begin{equation} R_{\mathbf{X},T}\left(\tau\right)\overset{\text{def}}{=} \operatorname*{diag}\left(C_{\mathbf{X},T}\left(0\right)\right)^{-\frac{1}{2}} C_{\mathbf{X},T}\left(\tau\right) \operatorname*{diag}\left(C_{\mathbf{X},T}\left(0\right)\right)^{-\frac{1}{2}}, \quad\forall\tau=0,1,\dots,T-1. \tag{8.4} \end{equation}\]

Proposition 8.1 (Time average estimator) For any \(T\in\mathbb{N}\), the time average estimator \(\bar{X}_{T}\) of size \(T\) of \(\mathbf{X}\) is an unbiased estimator of \(\mu_{\mathbf{X}}\) and its mean squared error is given by \[\begin{equation} \operatorname*{trace}\left(Var\left(\bar{X}_{T}\right)\right)=\left\{ \begin{array} [c]{ll} \frac{1}{T}\left(\sum\limits_{\tau=-\left(T-1\right)}^{T-1} \left(1-\frac{\left\vert\tau\right\vert}{T}\right) \operatorname*{trace}\left(\Gamma_{\mathbf{X},1}\left(\tau\right)\right)\right),& \text{if } \mathbb{T}\equiv\mathbb{N}_{0},\\ \frac{1}{2T+1}\left(\sum\limits_{\tau=-2T}^{2T} \left(1-\frac{\left\vert\tau\right\vert}{2T+1}\right) \operatorname*{trace}\left(\Gamma_{\mathbf{X},0}\left(\tau\right)\right)\right), & \text{if } \mathbb{T}\equiv\mathbb{Z}, \end{array} \right. \tag{8.5} \end{equation}\] where \(\Gamma_{\mathbf{X},1}\left(\tau\right)\) [resp. \(\Gamma_{\mathbf{X},0}\left(\tau\right)\)] is the value at the shift \(\tau\) of the reduced autocovariance function of \(\mathbf{X}\) referred to \(t_{0}=1\) [resp. \(t_{0}=0\)] (see Definition 7.2).

Corollary 8.1 (Time average estimator in case N=1) In case \(N=1\), for any \(T\in\mathbb{N}\), the mean squared error of \(\bar{X}_{T}\) is given by \[\begin{equation} \mathbf{D}^{2}\left[\bar{X}_{T}\right]=\left\{ \begin{array} [c]{ll} \frac{\sigma_{X}^{2}}{T}\left(1+2\sum\limits_{\tau=1}^{T-1} \left(1-\frac{\tau}{T}\right)\rho_{\mathbf{X},1}\left(\tau\right)\right), & \text{if }\mathbb{T}\equiv\mathbb{N}\\ \frac{\sigma_{X}^{2}}{2T+1}\left(1+2\sum\limits_{\tau=1}^{2T} \left(1-\frac{\tau}{2T+1}\right)\rho_{\mathbf{X},0}\left(\tau\right)\right), & \text{if }\mathbb{T}\equiv\mathbb{Z} \end{array} \right. \label{N-1-time-average-est-prp-equ} \end{equation}\] where \(\mathrm{P}_{\mathbf{X},1}\left(\tau\right)\) [resp. \(\mathrm{P}_{\mathbf{X},0}\left(\tau\right)\)] is the value at the shift \(\tau\) of the reduced autocorrelation function of \(\mathbf{X}\) referred to \(t_{0}=1\) [resp. \(t_{0}=0\)] (see Definition 7.2).

Definition 8.5 (Ergodicity in the mean) We say that the process \(\mathbf{X}\) is probability [resp. mean-square] ergodic in the mean if \[\begin{equation} \bar{X}_{T}\overset{\mathbf{P}}{\rightarrow}\mu_{\mathbf{X}} \quad\text{[resp. }\bar{X}_{T}\overset{\mathbf{L}^{2}}{\rightarrow}\mu_{\mathbf{X}}\text{]}, \tag{8.6} \end{equation}\] as \(T\rightarrow\infty\).

Remark (**Ergodicity in the mean**). If the process \(\mathbf{X}\) is mean-square ergodic in the mean, then \(\mathbf{X}\) is probability ergodic in the mean.

As a consequence of Definition 8.2, a SSS process does not need to be mean-square ergodic in the mean, not even in probability. For instance with reference to the process described in Example 6.2, we clearly have \[\begin{equation} \bar{X}_{T}=X \end{equation}\] Therefore, unless \(X\) is a Dirac random variable, \[\begin{equation} \bar{X}_{T}\overset{\mathbf{P}}{\not\rightarrow}\mu_{X}. \end{equation}\]

Proposition 8.2 (Necessary and sufficient condition for ergodicity in the mean) The process \(\mathbf{X}\) is mean-square ergodic in the mean if and only if \[\begin{equation} \lim_{T\rightarrow+\infty}Var\left(\bar{X}_{T}\right)=0. \tag{8.7} \end{equation}\]

Assume \(\mathbb{T}\equiv\mathbb{N}\), we have

Theorem 8.1 (Necessary and sufficient condition for ergodicity in the mean (Slutsky theorem)) Assume \(N=1\). Then the process \(\mathbf{X}\) is mean-square ergodic in the mean if and only if \[\begin{equation} \lim_{T\rightarrow+\infty}\frac{1}{T}\sum\limits_{\tau=0}^{T-1}\gamma_{\mathbf{X},1}\left(\tau\right)=0. \end{equation}\]

Remark (Slutsky condition). We have \[\begin{equation} \frac{1}{T}\sum\limits_{\tau=1}^{T-1}\gamma_{X,0}\left(\tau\right) =Cov\left(\bar{X}_{T},X_{T}\right) . \end{equation}\]

Lemma 8.1 (Necessary and sufficient condition for ergodicity in the mean (Cesaro lemma)) Let \(\left(a_{n}\right)\) a sequence of real numbers such that \[\begin{equation} \lim_{n\rightarrow\infty}a_{n}=a\in\mathbb{R}. \tag{8.8} \end{equation}\] Then, we have \[\begin{equation} \lim_{n\rightarrow\infty}\frac{1}{n}\sum\limits_{k=1}^{n}a_{k}=a. \tag{8.8} \end{equation}\]

Proof. Under Assumption (8.8), for any \(\varepsilon>0\) there exists \(m_{\varepsilon}\) such that for every \(n>m_{\varepsilon}\) we have \[ \left\vert a_{n}-a\right\vert <\frac{\varepsilon}{2}. \] Therefore, we can write \[\begin{align} \left\vert \frac{1}{n}\sum\limits_{k=1}^{n}a_{k}-a\right\vert & =\left\vert \frac{1}{n}\sum\limits_{k=1}^{n}\left( a_{k}-a\right) \right\vert \leq \frac{1}{n}\sum\limits_{k=1}^{n}\left\vert a_{k}-a\right\vert \\ & =\frac{1}{n}\left( \sum\limits_{k=1}^{m_{\varepsilon}}\left\vert a_{k}-a\right\vert +\sum\limits_{k=m_{\varepsilon}+1}^{n}\left\vert a_{k}-a\right\vert \right) \\ & <\frac{1}{n}\left( \sum\limits_{k=1}^{m_{\varepsilon}}\left\vert a_{k}-a\right\vert +\sum\limits_{k=m_{\varepsilon}+1}^{n}\frac{\varepsilon} {2}\right) \\ & =\frac{1}{n}\left( \sum\limits_{k=1}^{m_{\varepsilon}}\left\vert a_{k}-a\right\vert +\left( n-m_{\varepsilon}\right) \frac{\varepsilon}% {2}\right) \\ & =\frac{1}{n}\sum\limits_{k=1}^{m_{\varepsilon}}\left\vert a_{k} -a\right\vert +\frac{n-m_{\varepsilon}}{n}\frac{\varepsilon}{2}\\ & <\frac{1}{n}\sum\limits_{k=1}^{m_{\varepsilon}}\left\vert a_{k} -a\right\vert +\frac{\varepsilon}{2}. \end{align}\] In the end, since the sum \(\sum\limits_{k=1}^{m_{\varepsilon}}\left\vert a_{k}-a\right\vert\) does not depend on \(n\), we can choose \(n_{\varepsilon}>m_{\varepsilon}\) such that \[ \frac{1}{n}\sum\limits_{k=1}^{m_{\varepsilon}}\left\vert a_{k}-a\right\vert <\frac{\varepsilon}{2}. \] We then obtain \[ \left\vert\frac{1}{n}\sum\limits_{k=1}^{n}a_{k}-a\right\vert <\varepsilon, \] for every \(n>n_{\varepsilon}\), which proves Equation (8.8).

Theorem 8.2 (Sufficient condition for ergodicity in the mean (Slutsky)) In the case \(N=1\), assume that \[\begin{equation} \lim_{\tau\rightarrow+\infty}\gamma_{X,1}\left(\tau\right)=0. \end{equation}\] Then the process \(\mathbf{X}\) is mean-square ergodic in the mean.

Corollary 8.2 (Sufficient condition for ergodicity in the mean in case N=1 (Slutsky)) In the case \(N=1\), assume that \[\begin{equation} \lim_{\tau\rightarrow+\infty}\rho_{X,1}\left(\tau\right)=0. \end{equation}\] Then the process \(\mathbf{X}\) is mean-square ergodic in the mean.

Theorem 8.3 (Sufficient condition for ergodicity in the mean (Law of Large Numbers)) Assume that the random variables in the process \(\mathbf{X}\) are uncorrelated. Then the process \(\mathbf{X}\) is mean-square ergodic in the mean.

Assume that \(\mathbf{X}\) is a weak-sense stationary process of order \(4\).

Definition 8.6 (Ergodicity in the variance-covariance) We say that the process \(\mathbf{X}\) is probability [resp. mean-square] ergodic in the variance-covariance if \[\begin{equation} S_{\mathbf{X},T}^{2}\overset{\mathbf{P}}{\rightarrow}\Sigma_{\mathbf{X}}^{2} \quad\text{[resp. }S_{\mathbf{X},T}^{2}\overset{\mathbf{L}^{2}}{\rightarrow}\Sigma_{\mathbf{X}}^{2}\text{]}, \end{equation}\] as \(T\rightarrow+\infty\).

Definition 8.7 (Ergodicity in the autocovariance) We say that the process \(\mathbf{X}\) is probability [resp. mean-square] ergodic in the autocovariance if \[\begin{equation} G_{\mathbf{X},T}\left(\tau\right)\overset{\mathbf{P}}{\rightarrow} \gamma_{\mathbf{X},1}\left(\tau\right) \quad\text{or}\quad G_{\mathbf{X},T}\left(\tau\right)\overset{\mathbf{P}}{\rightarrow} \gamma_{\mathbf{X},0}\left(\tau\right) \quad\text{[resp. }G_{\mathbf{X},T}\left(\tau\right)\overset{\mathbf{L}^{2}}{\rightarrow} \gamma_{\mathbf{X},1}\left(\tau\right) \quad\text{or}\quad G_{\mathbf{X},T}\left(\tau\right)\overset{\mathbf{L}^{2}}{\rightarrow} \gamma_{\mathbf{X},0}\left(\tau\right)\text{]}, \end{equation}\] according to whether \(\mathbb{T}\equiv\mathbb{N}\) or \(\mathbb{T}\equiv\mathbb{Z}\) for every \(\tau\in\mathbb{T}_{0}\), as \(T\rightarrow+\infty\).

If the process \(\mathbf{X}\) is ergodic in the autocovariance, then \(\mathbf{X}\) is also ergodic in the variance-covariance.

Definition 8.8 (Ergodicity in the wide sense) We say that the process probability [resp. mean-square] ergodic in in the wide sense, if it is ergodic both in the mean and the autocovariance.

Theorem 8.4 (Sufficient condition for ergodicity in the wide sense) Assume \(\mathbb{T}\equiv\mathbb{N}\) and \[\begin{equation} \lim_{t\rightarrow+\infty}Cov\left(X_{1}X_{\tau},X_{t}X_{t+\tau-1}\right)=0, \quad\forall \tau=1,2,\dots \end{equation}\] Then \(\mathbf{X}\) is mean square ergodic in the wide sense.

By the direct inspection, there is no way to decide whether we can think of a time series \(\mathbf{x}\) as the sample-path of an ergodic process \(\mathbf{X}\). However, the ergodicity of the process \(\mathbf{X}\) chosen as a model for the time series \(\mathbf{x}\) is necessary to make inferences from \(\mathbf{x}\). As it is shown by the presented results, ergodicity is related to the asymptotic independence of the random variables in the process. This is a rather natural property of several natural noise processes. In fact, it is rather natural to think that, as time goes on, the influence of the past states of the noise, which affects a stochastic phenomenon, on the current states of the stochastic phenomenon vanishes.

The following plot illustrates the inclusion relationships between SSS processess, processess of order \(2\), WSS processess, and ergodic processes.

Data_df=data.frame(x1=c(0,0,1,2,3,3,2,1,0), y1=c(2,3,4,4,3,2,1,1,2), 
             x2=c(1.5,2,3,4.5,3,2,1.5,NA,NA), y2=c(2.5,4,7,2.5,-2,1,2.5,NA,NA),
             x3=c(1.5,2,3,3.67,3,2,NA,NA,NA), y3=c(2.5,4,4,2.5,1,1,NA,NA,NA),
             x4=c(3,rep(NA,8)), y4=c(2.5,rep(NA,8)))
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Inclusion Relationship Between Strong-Sense Stationary Processess, Processes of order 2, Weak-Sense Stationary Processess, and Ergodic Processes"))
caption_content <- "Author: Roberto Monte"
x_breaks <- c(0:5)
x_labs <- format(x_breaks, scientific=FALSE)
y_breaks <- c(0:5)
y_labs <- format(y_breaks, scientific=FALSE)
fill_r <- bquote("SSS processes")
fill_g <- bquote("Processes of order 2")
fill_b <- bquote("WSS processes")
y_m_fill <- bquote("Ergodic processes")
leg_fill_labs <- c(fill_r, fill_g, fill_b,y_m_fill)
leg_fill_breaks <- c("fill_r", "fill_g", "fill_b", "y_m_fill")
leg_fill_cols <- c("fill_r"="orangered", "fill_g"="green", "fill_b"="blue", "y_m_fill"="magenta")
    ggplot() +
    geom_path(data=Data_df, mapping=aes(x=x1, y=y1), colour="red") +             
    geom_path(data=Data_df, mapping=aes(x=x2, y=y2), colour="green", na.rm=TRUE) +
    geom_path(data=Data_df, mapping=aes(x=x3, y=y3), colour="blue", na.rm=TRUE) +             
    geom_polygon(data=Data_df, mapping=aes(x=x1, y=y1, fill="fill_r"), alpha=0.4) +             
    geom_polygon(data=Data_df, mapping=aes(x=x2, y=y2, fill="fill_g"), alpha=0.4) +
    geom_polygon(data=Data_df, mapping=aes(x=x3, y=y3, fill="fill_b"), alpha=0.4) +
    geom_circle(data=Data_df, mapping=aes(x0=x4, y0=y4, r=0.30, fill="y_m_fill"), colour="magenta", alpha=0.3, na.rm=TRUE) +
    scale_x_continuous(name="", breaks=x_breaks, labels=x_labs) +
    scale_y_continuous(name="", breaks=y_breaks, labels=y_labs) +
    ggtitle(title_content) +
    labs(caption=caption_content) +  
    scale_fill_manual(name="Legend", labels=leg_fill_labs, values=leg_fill_cols, breaks=leg_fill_breaks) +
    theme(plot.title=element_text(hjust = 0.5),
    plot.caption = element_text(hjust = 1.0),
    legend.key.width = unit(0.8,"cm"), legend.position="bottom")

9 Strong White Noises

Let \(\mathbb{T}\subseteq\mathbb{Z}\) and let \(\left(W_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{W}\) be a stochastic process on a probability space \(\left(\Omega,\mathcal{E},\mathbf{P}\right)\equiv\Omega\) with state space \(\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\).

Definition 9.1 (Strong white noise) We say that \(\mathbf{W}\) is strong white noise (SWN) or an independent identically distributed noise (IIDN), if \(\mathbf{W}\) has order \(2\) and the random variables in \(\mathbf{W}\) are independent and identically distributed with mean \(0\). Note that some authors weaken the order \(2\) prescription to order \(1\). However, the weaker prescription is not very relevant in the context of the essentials of time series analysis. On the other hand, as we will see below, the order \(2\) prescription makes strong white noises a special case of weak white noises.

To denote that \(\mathbf{W}\) is a strong white noise with state space \(\mathbb{R}^{N}\) we write \(\mathbf{W}\sim SWN^{N}\left(\Sigma_{\mathbf{W}}^{2}\right)\) or \(\mathbf{W}\sim IID^{N}\left(\Sigma_{\mathbf{W}}^{2}\right)\), where \(\Sigma_{\mathbf{W}}^{2}\) is the common variance-covariance matrix of the random variables in \(\mathbf{W}\). In case \(N=1\), we set \(\Sigma_{\mathbf{W}}^{2}\equiv\sigma_{\mathbf{W}}^{2}\) and, neglecting the reference to \(N\), we write \(\mathbf{W}\sim SWN\left(\sigma_{\mathbf{W}}^{2}\right)\) or \(\mathbf{W}\sim IID\left(\sigma_{\mathbf{W}}^{2}\right)\).

As an immediate consequence of Definition 9.1 and Corollary 6.3, a SWN is a SSS process of order \(2\). Hence, it is also a WSS process.

Let \(\mathbf{W}\sim SWN^{N}\left(\Sigma_{\mathbf{W}}^{2}\right)\).

The mean [resp. variance-covariance] function \(\mu_{\mathbf{W}}:\mathbb{T}\rightarrow\mathbb{R}^{N}\) [resp. \(\Sigma_{\mathbf{W}}^{2}:\mathbb{T}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\)] satisfies \[\begin{equation} \mu_{\mathbf{W}}\left(t\right)=0,\quad\text{[resp. }\Sigma_{\mathbf{W}}^{2}\left(t\right)=\Sigma_{\mathbf{W}}^{2}\text{]}, \tag{9.1} \end{equation}\] for every \(t\in\mathbb{T}\). The autocovariance function \(\Gamma_{\mathbf{W}}:\mathbb{T}^{2}\rightarrow\mathbb{R}^{N}\) and the autocorrelation function \(\mathrm{P}_{\mathbf{W}}:\mathbb{T}\times\mathbb{T}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) satisfy \[\begin{equation} \Gamma_{\mathbf{W}}\left(s,t\right)=\mathrm{P}_{\mathbf{W}}\left(s,t\right)=0, \tag{9.2} \end{equation}\] for all \(s,t\in\mathbb{T}\) such that \(s\neq t\).

Fixed any \(t_{0}\in\mathbb{T}\), set \(\mathbb{T}_{0}\equiv\left\{\tau\in\mathbb{R}:t_{0}+\tau\in\mathbb{T}\right\}\). The reduced autocovariance function \(\Gamma_{\mathbf{W},t_{0}}:\mathbb{T}_{0}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) and the reduced autocorrelation function \(\mathrm{P}_{\mathbf{W},t_{0}}:\mathbb{T}_{0}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) referred to \(t_{0}\) satisfy \[\begin{equation} \Gamma_{\mathbf{W},t_{0}}\left(\tau\right)=\left\{ \begin{array} [c]{ll} \Sigma_{\mathbf{W}}^{2}, & \text{if }\tau=0,\\ 0, & \text{if }\tau\neq 0, \end{array} \right. \quad\text{and}\quad \mathrm{P}_{\mathbf{W},t_{0}}\left(\tau\right)=\left\{ \begin{array} [c]{ll} I_{N}, & \text{if }\tau=0,\\ 0, & \text{if }\tau\neq 0, \end{array} \right. \tag{9.3} \end{equation}\] where \(I_{N}\) is the identity matrix in \(\mathbb{R}^{N}\times\mathbb{R}^{N}\).

Assume that \(\mathbb{T}\equiv\mathbb{Z}\). Then \(\mathbb{T}_{0}=\mathbb{Z}\) and the reduced partial autocorrelation function \(\Psi_{\mathbf{W},t_{0}}:\mathbb{Z}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) referred to \(t_{0}\) satisfies \[\begin{equation} \Psi_{\mathbf{W},t_{0}}\left(\tau\right)=0, \end{equation}\] for every \(\tau\neq 0\).

Proposition 9.1 (SWN as an ergodic process) Assume that \(\mathbb{T}\equiv\mathbb{N}\) or \(\mathbb{T}\equiv\mathbb{Z}\). Then the process \(\mathbf{W}\) is mean-square ergodic in the wide sense.

Proof. The claim clearly follows from the Slutsky theorem (see Theorem 8.2) and Equation (9.3)

Definition 9.2 (Gaussian white noise) We say that \(\mathbf{W}\) is a Gaussian white noise if the random variables in \(\mathbf{W}\) are Gaussian distributed.

To denote that \(\mathbf{W}\) is a Gaussian white noise with state space \(\mathbb{R}^{N}\) we write \(\mathbf{W}\sim GWN^{N}\left(\Sigma_{\mathbf{W}}^{2}\right)\), where \(\Sigma_{\mathbf{W}}^{2}\) is the common variance-covariance matrix of the random variables in \(\mathbf{W}\). In case \(N=1\), we set \(\Sigma_{\mathbf{W}}^{2}\equiv\sigma_{\mathbf{W}}^{2}\) and, neglecting the reference to \(N\), we write \(\mathbf{W}\sim GWN\left(\sigma_{\mathbf{W}}^{2}\right)\).

Proposition 9.2 (GWN as a Gaussian process) Assume that \(\mathbf{W}\sim GWN^{N}\left(\Sigma_{\mathbf{W}}^{2}\right)\). Then \(\mathbf{W}\) is a Gaussian process.

Proof. The claim clearly follows recalling that any finite set of Gaussian distributed and independent random variables has a joint Gaussian distribution.

Let \(\mathbf{W}\sim SWN\left(\sigma^{2}_{\mathbf{W}}\right)\). For simplicity, assume that \(\mathbb{T}\equiv\mathbb{N}\). Let \(T>0\), and, for any \(\tau=1,\dots,T-1\), let \(C_{\mathbf{W},T}\left(\tau\right)\) [resp. \(R_{\mathbf{W},T}\left(\tau\right)\)] be autocovariance [resp. autocorrelation] estimator of size \(T\) of \(\mathbf{X}\) for the shift \(\tau\).

Theorem 9.1 (Autocovariance and autocorrelation of a SWN) Given any \(S\in\mathbb{N}\), write \(I_{S}\) for the identity matrix of order \(S\). We have

  1. as \(T\rightarrow\infty\), the vector \(\left(C_{\mathbf{W},T}\left(1\right),\dots,C_{\mathbf{W},T}\left(S\right)\right)^{\intercal}\) converges in distribution to \(N\left(0,\frac{\sigma_{\mathbf{W}}^{2}}{T}I_{S}\right)\);

  2. as \(T\rightarrow\infty\), the vector \(\left(R_{\mathbf{W},T}\left(1\right),\dots,R_{\mathbf{W},T}\left(S\right)\right)^{\intercal}\) converges in distribution to \(N\left(0,\frac{1}{T}I_{S}\right)\).

Assume that \(T\) is large (as a rule of thumb of the pre-computer age, according to some scholars, \(T\) is to be considered large when \(T\geq 30\), according to other scholars, when \(T\geq 40\)) and fix any \(S\in\mathbb{N}\) such that \(S\ll T\) (e.g. \(S\equiv\min \left\{20, T-1\right\}\), see Box, G.E.P., Jenkins, G.M., and Reinsel, G.C., Time Series Analysis: Forecasting and Control, 3rd Ed., Englewood Cliffs, NJ, Prentice Hall (1994), or \(S\equiv\lfloor log\left(T\right)\rfloor\), see Tsay, R.S. Analysis of Financial Time Series, 2nd Ed., Hoboken, NJ, John Wiley & Sons Inc. (2005)). As a consequence of Theorem 9.1, the vector \(\left(R_{\mathbf{W},T}\left(1\right),\dots,R_{\mathbf{W},T}\left(S\right)\right)^{\intercal}\) is approximately Gaussian distributed with mean \(0\) and variance \(1/T\). It follows that, given any \(\alpha\in\left(0,1\right)\), an approximate confidence interval at the confidence level (c.l.) of \(100\left(1-\alpha\right)\%\) for the realization \[\begin{equation} r_{\mathbf{W},T}\left(\tau\right)\equiv \frac{\sum\limits_{t=1}^{T-\tau}\left(w_{t}-\bar{\mathbf{w}}_{T}\right) \left(w_{t+\tau}-\bar{\mathbf{w}}_{T}\right)} {\sum\limits_{t=1}^{T}\left(w_{t}-\bar{\mathbf{w}}_{T}\right)^{2}} \end{equation}\] of the statistic \(R_{\mathbf{W},T}\left(\tau\right)\), corresponding to a sample path \(\left(w_{t}\right)_{t=1}^{T}\equiv\mathbf{w}\) of \(\mathbf{W}\), is given by \[\begin{equation} \left(-\frac{z_{\alpha/2}}{\sqrt{T}},\ \frac{z_{\alpha/2}}{\sqrt{T}}\right), \end{equation}\] for every \(\tau=1,\dots,S\), where \(z_{\alpha/2}\) is the upper tail critical value of level \(\alpha/2\) of \(Z\sim N\left(0,1\right)\). In addition, there is evidence against the null hypothesis \(H_{0}:\mathbf{E}\left[R_{\mathbf{W},T}\left(\tau\right)\right]=0\), at the approximate significance level (s.l.) of \(100\alpha\%\), when \[\begin{equation} \left\vert r_{\mathbf{W},T}\left(\tau\right)\right\vert >\frac{z_{\alpha/2}}{\sqrt{T}} \Leftrightarrow \mathbf{P}\left(\left\vert Z\right\vert \geq\sqrt{T}\left\vert r_{\mathbf{W},T}\left(\tau\right)\right\vert \right)<\alpha/2. \end{equation}\] In particular, we have the approximate confidence interval \[\begin{equation} \begin{array} [c]{ll} \left(-\frac{1.645}{\sqrt{T}},\frac{1.645}{\sqrt{T}}\right)& \text{at }90\%\text{ c.l.}\\ \left(-\frac{1.96}{\sqrt{T}},\ \frac{1.96}{\sqrt{T}}\right)& \text{at }95\%\text{ c.l.}\\ \left(-\frac{2.575}{\sqrt{T}},\ \frac{2.575}{\sqrt{T}}\right)& \text{at }99\%\text{ c.l.} \end{array} \end{equation}\] and there is evidence against the null hypothesis when \[\begin{equation} \begin{array} [c]{ll} \left\vert r_{\mathbf{W},T}\left(\tau\right)\right\vert >\frac{1.645}{\sqrt{T}} \Leftrightarrow\mathbf{P} \left(\left\vert Z\right\vert\geq \sqrt{T}\left\vert r_{\mathbf{W},T}\left(\tau\right)\right\vert\right)<0.1 & \text{at nearly }10\%\text{ s.l.};\\ \left\vert r_{\mathbf{W},T}\left(\tau\right)\right\vert >\frac{1.96}{\sqrt{T}} \Leftrightarrow\mathbf{P}\left(\left\vert Z\right\vert\geq \sqrt{T}\left\vert r_{\mathbf{W},T}\left(\tau\right)\right\vert\right)<0.05 & \text{at nearly }5\%\text{ s.l.};\\ \left\vert r_{\mathbf{W},T}\left(\tau\right)\right\vert >\frac{2.575}{\sqrt{T}} \Leftrightarrow\mathbf{P}\left(\left\vert Z\right\vert\geq \sqrt{T}\left\vert r_{\mathbf{W},T}\left(\tau\right)\right\vert\right)<0.01 & \text{at nearly }1\%\text{ s.l.}. \end{array} \end{equation}\]

Let \(\left(X_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{X}\) a real stochastic process.

Corollary 9.1 (SWN hypothesis test) Given any \(\alpha\in\left(0,1\right)\), assume that \(100\left(1-\alpha\right)\%\) of the realizations of the time autocorrelations \(r_{\mathbf{W},T}\left(\tau\right)\) of length \(T\) corresponding to a sample path \(\left(w_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{w}\) of \(\mathbf{W}\), fall outside the interval \[\begin{equation} \left(\frac{z_{\alpha/2}}{\sqrt{T}},\ \frac{z_{\alpha/2}}{\sqrt{T}}\right), \end{equation}\] on varying of the time shift \(\tau=1,\dots,S\), where \(S\ll T\). Then, there is evidence against the null hypothesis that \(\mathbf{w}\) is a sample path of a SWN at the approximate significance level of \(100\alpha\%\).

Let \(\left(X_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{X}\) be a real stochastic process and let \(S,T\in\mathbb{N}\) such that \(S<T\).

Definition 9.3 (Ljung-Box statistic) We call Ljung-Box statistic of length \(S\) on \(\mathbf{X}\) the statistic \[\begin{equation} Q_{\mathbf{X}.T}\left(S\right)\overset{\text{def}}{=}T\left(T+2\right) \sum\limits_{\tau=1}^{S}\frac{\mathrm{P}_{\mathbf{X},T}\left(\tau\right)}{T-\tau}, \end{equation}\] where, as above, \(\mathrm{P}_{\mathbf{X},T}\left(\tau\right)\) is time autocorrelation estimator of size \(T\) of \(\mathbf{X}\) and shift \(\tau\).

Theorem 9.2 (Ljung-Box statistic-) Assume that \(T\) is large and \(S\ll T\). Then under the null hypothesis \(H_{0}:\mathbf{W}\sim SWN\left(\sigma_{\mathbf{W}}^{2}\right)\), the Ljung-Box statistic \(Q_{\mathbf{X}.T}\left(S\right)\) has the Chi-square distribution with \(S\) degrees of freedom. In symbols, \(Q_{\mathbf{X}.T}\left(S\right)\sim\chi_{S}^{2}\). As a consequence, there is evidence against the null hypothesis at the approximate significance level of \(100\alpha\%\), for any \(\alpha\in\left(0,1\right)\), when \[\begin{equation} q_{\mathbf{X}.T}\left(S\right)>\chi_{u,\alpha,S}^{2}\Leftrightarrow\mathbf{P} \left(\chi_{S}^{2}\geq q_{\mathbf{X}.T}\left(S\right)\right)<\alpha/2, \end{equation}\] where \(q_{\mathbf{X}.T}\left(S\right)\) is the realization of the statistic \(Q_{\mathbf{X}.T}\left(S\right)\) corresponding to a sample path \(\left(w_{t}\right)_{t=1}^{T}\equiv\mathbf{w}\) of \(\mathbf{W}\) and \(\chi_{u,\alpha,S}^{2}\) is the upper \(\alpha\)-critical value of the Chi-square distribution with \(S\) degrees of freedom.

Note that the Ljung-Box is a portmanteau test: the null hypothesis is well specified but the alternative hypothesis is more loosely specified. Actually, when there is evidence against the null hypothesis \(H_{0}\) the alternative suggests that data are not likely to be generated via by independent sequence of random variables, due to the apparent presence of serial correlation.

9.1 Parameter Estimation

Let \(\mathbb{T}\equiv\mathbb{N}\) or \(\mathbb{T}\equiv\mathbb{Z}\) and let \(\left(W_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{W}\) be a strong white noise with state space \(\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\), that is \(SWN^{N}\left(\Sigma_{\mathbf{W}}^{2}\right)\) for a positive definite symmetric matrix \(\Sigma_{\mathbf{W}}^{2}\in\mathbb{R}^{N}\times\mathbb{R}^{N}\). Let \(\bar{\mathbf{W}}_{T}\) [resp. \(S_{\mathbf{W},T}^{2}\) resp. \(C_{\mathbf{W},T}^{2}\), resp. \(R_{\mathbf{W},T}^{2}\)] be the time average [resp. time variance, resp. autocovariance, resp. time autocorrelation] estimator of size \(T\) of \(\mathbf{W}\) (see Definition 8.1 [resp. 8.2, resp. 8.3, resp. 8.4].

Proposition 9.3 (Time average of a SWN) The statistic \(\bar{\mathbf{W}}_{T}\) is an unbiased estimator of \(\mu_{\mathbf{W}}=0\) with mean squared error given by \(Var\left(\bar{\mathbf{W}}_{T}\right)\).

Proposition 9.4 (Approximate confidence interval and hypothesis test for the time average of a SWN) Assume that \(T\) is large. Then an approximate confidence interval for \(\mu_{\mathbf{W}}=0\), at the confidence level (c.l.) of \(100\left(1-\alpha\right)\%\), for any \(\alpha\in\left(0,1\right)\), is given by \[\begin{equation} \left(\bar{\mathbf{W}}_{T}-z_{\alpha/2}\frac{S_{\mathbf{W},T}}{\sqrt{T}},\ \bar{\mathbf{W}}_{T}+z_{\alpha/2}\frac{S_{\mathbf{W},T}}{\sqrt{T}}\right), \tag{9.4} \end{equation}\] where \(S_{\mathbf{W},T}\equiv\sqrt{S_{\mathbf{W},T}^{2}}\) is the time standard deviation of \(\mathbf{W}\) and the positive number \(z_{\alpha/2}\) is the upper tail critical value of level \(\alpha/2\) of \(Z\sim N\left(0,1\right)\). Hence, a realization of the confidence interval (3.56) is given by \[\begin{equation} \left(\bar{\mathbf{w}}_{T}-z_{\alpha/2}\frac{s_{\mathbf{W},T}}{\sqrt{T}},\ \bar{\mathbf{w}}_{T}+z_{\alpha/2}\frac{s_{\mathbf{W},T}}{\sqrt{T}}\right), \end{equation}\] where \(\bar{\mathbf{w}}_{T}\) [resp. \(s_{\mathbf{W},T}\)] is the realization of the estimator \(\bar{\mathbf{W}}_{T}\) [resp. \(S_{\mathbf{W},T}\)] corresponding to a sample path \(\left(w_{t}\right)_{t=1}^{T}\equiv\mathbf{w}\) of \(\mathbf{W}\). In addition, there is evidence against the null hypothesis \(H_{0}:\mu_{\mathbf{W}}=0\) at the approximate significance level (s.l.) of \(100\alpha\%\), for any \(\alpha\in\left(0,1\right)\), when \[\begin{equation} \left\vert\frac{\bar{\mathbf{w}}_{T}\sqrt{T}}{s_{\mathbf{W},T}}\right\vert >z_{\alpha/2}\Leftrightarrow\mathbf{P}\left(\left\vert Z\right\vert \geq\left\vert \frac{\bar{\mathbf{w}}_{\mathbf{W},T}\sqrt{T}}{s_{\mathbf{W},T}}\right\vert\right)<\alpha. \end{equation}\] In particular, we have approximately \[ \begin{array} [c]{ll} \left(\bar{\mathbf{w}}_{T}-1.645\frac{s_{\mathbf{W},T}}{\sqrt{T}},\ \bar{\mathbf{w}}_{T}+1.645\frac{s_{\mathbf{W},T}}{\sqrt{T}}\right) & \text{at }90\%\text{ c.l.}\\ \left(\bar{\mathbf{w}}_{T}-1.96\frac{s_{\mathbf{W},T}}{\sqrt{T}},\ \bar{\mathbf{w}}_{T}+1.96\frac{s_{\mathbf{W},T}}{\sqrt{T}}\right) & \text{at }95\%\text{ c.l.}\\ \left(\bar{\mathbf{w}}_{T}-2.575\frac{s_{\mathbf{W},T}}{\sqrt{T}},\ \bar{\mathbf{w}}_{T}+2.575\frac{s_{\mathbf{W},T}}{\sqrt{T}}\right) & \text{at }99\%\text{ c.l.} \end{array} \] and there is evidence against the null hypothesis \(H_{0}:\mu_{\mathbf{W}}=0\) when \[ \begin{array} [c]{l} \left\vert \frac{\bar{\mathbf{w}}_{T}\sqrt{T}}{s_{\mathbf{W},T}}\right\vert >1.645\Leftrightarrow\mathbf{P}\left(\left\vert Z\right\vert \geq\left\vert \frac{\bar{\mathbf{w}}_{T}\sqrt{T}}{s_{\mathbf{W},T}}\right\vert \right)<0.1 & \text{at nearly }10\%\text{ s.l.}\\ \left\vert \frac{\bar{\mathbf{w}}_{T}\sqrt{T}}{s_{\mathbf{W},T}}\right\vert >1.96\Leftrightarrow\mathbf{P}\left(\left\vert Z\right\vert \geq\left\vert \frac{\bar{\mathbf{w}}_{T}\sqrt{T}}{s_{\mathbf{W},T}}\right\vert \right) <0.05 & \text{at nearly }5\%\text{ s.l.}\\ \left\vert \frac{\bar{\mathbf{w}}_{T}\sqrt{T}}{s_{\mathbf{W},T}}\right\vert >2.575\Leftrightarrow\mathbf{P}\left(\left\vert Z\right\vert \geq\left\vert \frac{\bar{\mathbf{w}}_{T}\sqrt{T}}{s_{\mathbf{W},T}}\right\vert \right) <0.01 & \text{at nearly }1\%\text{ s.l.}. \end{array} \]

Proposition 9.5 (Confidence intervals and hypothesis test for the time average of a GSWN) Assume that \(\mathbf{W}\sim GSWN\left(\sigma_{\mathbf{W}}^{2}\right)\). Then a confidence interval for \(\mu_{\mathbf{W}}=0\), at the confidence level (c.l.) of \(100\left(1-\alpha\right)\%\), for any \(\alpha\in\left(0,1\right)\), is given by \[\begin{equation} \left(\bar{\mathbf{W}}_{T}-t_{T-1,\alpha/2}\frac{S_{\mathbf{W},T}}{\sqrt{T}},\ \bar{\mathbf{W}}_{T}+t_{T-1,\alpha/2}\frac{S_{\mathbf{W},T}}{\sqrt{T}}\right), \tag{3.53} \end{equation}\] where \(S_{\mathbf{W},T}\equiv\sqrt{S_{\mathbf{W},T}^{2}}\) is the time standard deviation of \(\mathbf{W}\) and the positive number \(t_{T-1,\alpha/2}\) is the upper tail critical value of level \(\alpha/2\) of the Student random variable \(t_{T-1}\) with \(T-1\) degree of freedom. Hence, a realization of the confidence interval (3.53) is given by \[\begin{equation} \left(\bar{\mathbf{w}}_{T}-t_{T-1,\alpha/2}\frac{s_{\mathbf{W},T}}{\sqrt{T-1}},\ \bar{\mathbf{w}}_{T}+t_{T-1,\alpha/2}\frac{s_{\mathbf{W},T}}{\sqrt{T-1}}\right), \end{equation}\] where \(\bar{\mathbf{w}}_{T}\) [resp. \(s_{\mathbf{W},T}\)] is the realization of the estimator \(\bar{\mathbf{W}}_{T}\) [resp. \(S_{\mathbf{W},T}\)] corresponding to a sample path \(\left(w_{t}\right)_{t=1}^{T}\equiv\mathbf{w}\) of \(\mathbf{W}\). In addition, there is evidence against the null hypothesis \(H_{0}:\mu_{\mathbf{W}}=0\) at the significance level of \(100\alpha\%\), for any \(\alpha\in\left(0,1\right)\), when \[\begin{equation} \left\vert \frac{\bar{\mathbf{w}}_{T}}{s_{\mathbf{W},T}/\sqrt{T-1}}\right\vert >t_{T-1,\alpha/2}\Leftrightarrow\mathbf{P}\left(\left\vert X_{T-1} \right\vert \geq\left\vert \frac{\bar{\mathbf{w}}_{T}}{s_{\mathbf{W},T}/\sqrt{T-1}}\right\vert \right) <\alpha/2. \end{equation}\]

Proposition 9.6 (Time variance estimator of a SWN) Assume that \(\mathbf{W}\sim SWN\left(\sigma_{\mathbf{W}}^{2}\right)\) is a process of order \(4\). Then the statistic \(S^{2}_{\mathbf{W},T}\) is a biased estimator of \(\sigma_{\mathbf{W}}^{2}\) with mean squared error given by \[\begin{equation} \mathbf{MSE}\left(S^{2}_{\mathbf{W},T}\right)= \frac{\left(T-1\right)^{2}}{T^{2}}\frac{\sigma_{\mathbf{W}}^{4}}{T} \left(\frac{\mu_{\mathbf{W}}^{4}}{\sigma_{\mathbf{W}}^{4}}-\frac{T-3}{T-1}\right) +\frac{\sigma_{\mathbf{W}}^{4}}{T^{2}} \tag{9.5} \end{equation}\]

Proposition 9.7 (Approximate confidence intervals and hypothesis test for the time variance of a SWN) Assume that \(\mathbf{W}\sim SWN\left(\sigma_{\mathbf{W}}^{2}\right)\) is a process of order \(4\) and \(T\) is “large”. Then an approximate confidence interval for \(\sigma_{\mathbf{W}}\), at the confidence level of \(100\left(1-\alpha\right)\%\), for any \(\alpha\in\left(0,1\right)\), is given by \[\begin{equation} \left(\frac{S^{2}_{\mathbf{W},T}}{1-z_{\alpha/2}\sqrt{\left(Kurt_{\mathbf{W},T}-1\right)/T}}, \frac{S^{2}_{\mathbf{W},T}}{1+z_{\alpha/2}\sqrt{\left(Kurt_{\mathbf{W},T}-1\right)/T}}\right), \tag{9.6} \end{equation}\] where \(S_{\mathbf{W},T}\equiv\sqrt{S_{\mathbf{W},T}^{2}}\) [resp. \(Kurt_{\mathbf{W},T}\)] is the time standard deviation [kurtosis] of \(\mathbf{W}\) and the positive number \(z_{\alpha/2}\) is the upper tail critical value of level \(\alpha/2\) of \(Z\sim N\left(0,1\right)\). Hence, a realization of the confidence interval (9.6) is given by \[\begin{equation} \left(\frac{s^{2}_{\mathbf{W},T}}{1-z_{\alpha/2}\sqrt{\left(kurt_{\mathbf{W},T}-1\right)/T}}, \frac{s^{2}_{\mathbf{W},T}}{1+z_{\alpha/2}\sqrt{\left(kurt_{\mathbf{W},T}-1\right)/T}}\right), \tag{9.7} \end{equation}\] where \(s^{2}_{\mathbf{W},T}\) [resp. \(kurt_{\mathbf{W},T}\)] is the realization of the estimator \(S^{2}_{\mathbf{W},T}\) [resp. \(Kurt_{\mathbf{W},T}\)] corresponding to a sample path \(\left(w_{t}\right)_{t=1}^{T}\equiv\mathbf{w}\) of \(\mathbf{W}\). In addition, considering any \(\sigma>0\), there is evidence against the null hypothesis \(H_{0}:\sigma_{\mathbf{W}}=\sigma\) at the approximate significance level of \(100\alpha\%\), for any \(\alpha\in\left(0,1\right)\), when \[\begin{equation} \left\vert \frac{s^{2}_{\mathbf{W},T}-\sigma^{2}}{\sigma^{2} \sqrt{\left(kurt_{\mathbf{W},T}-1\right)/T}}\right\vert >z_{\alpha/2}\Leftrightarrow\mathbf{P}\left(\left\vert Z\right\vert \geq\left\vert \frac{s^{2}_{\mathbf{W},T}-\sigma^{2}}{\sigma^{2} \sqrt{\left(kurt_{\mathbf{W},T}-1\right)/T}}\right\vert\right)<\alpha. \end{equation}\]

Proposition 9.8 (Confidence intervals and hypothesis test for the time variance of a GSWN) Assume that \(\mathbf{W}\sim GSWN\left(\sigma_{\mathbf{W}}^{2}\right)\). Then a confidence interval for \(\sigma_{\mathbf{W}}^{2}\), at the confidence level of \(100\left(1-\alpha\right)\%\), for any \(\alpha\in\left(0,1\right)\), is given by \[\begin{equation} \left(\frac{TS_{\mathbf{W},T}^{2}}{\chi_{u,\alpha/2,T-1}^{2}}, \frac{TS_{\mathbf{W},T}^{2}}{\chi_{\ell,\alpha/2,T-1}^{2}}\right) \tag{9.8} \end{equation}\] where the real number \(\chi_{u,\alpha/2,T-1}^{2}\) [resp. \(\chi_{\ell,\alpha/2,T-1}^{2}\)] is the upper [resp. lower] \(\alpha/2\)-critical value of the chi-square random variable \(\chi_{T-1}^{2}\) with \(T-1\) degrees of freedom. Hence, a realization of the confidence interval (9.8) is given by \[\begin{equation} \left(\frac{Ts_{\mathbf{W},T}^{2}}{\chi_{u,\alpha/2,T-1}^{2}}, \frac{Ts_{\mathbf{W},T}^{2}}{\chi_{\ell,\alpha/2,T-1}^{2}}\right) \tag{9.9} \end{equation}\] where \(s^{2}_{\mathbf{W},T}\) is the realization of the estimator \(S^{2}_{\mathbf{W},T}\) corresponding to a sample path \(\left(w_{t}\right)_{t=1}^{T}\equiv\mathbf{w}\) of \(\mathbf{W}\). In addition, there is evidence against the null hypothesis \(H_{0}:\sigma_{\mathbf{W}}=\sigma\) at the significance level of \(100\alpha\%\), for any \(\alpha\in\left(0,1\right)\), when \[\begin{equation} \frac{Ts^{2}_{\mathbf{W},T}}{\sigma^{2}}<\chi_{\ell,\alpha/2,T-1}^{2} \text{ or }\frac{Ts^{2}_{\mathbf{W},T}}{\sigma^{2}}>\chi_{u,\alpha/2,T-1}^{2} \Leftrightarrow \min\left\{\mathbf{P}\left(\chi_{T-1}^{2}<\frac{Ts^{2}_{\mathbf{W},T}}{\sigma^{2}}\right), \mathbf{P}\left(\chi_{T-1}^{2}>\frac{Ts^{2}_{\mathbf{W},T}}{\sigma^{2}}\right)\right\}<\alpha/2. \tag{9.10} \end{equation}\]

Note that a Chi-square test is rather sensitive to deviations from the Gaussian distribution. Unlike the Student distribution, the Chi-square distribution is not robust to deviation from normality of the population distribution. If the white noise distribution is not Gaussian or close enough to Gaussian, possibly the null hypothesis will be mistakenly rejected.

9.2 Prediction of Future States and Prediction Intervals

Let \(\left(W_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{W}\) be a strong white noise on a probability space \(\left(\Omega,\mathcal{E},\mathbf{P}\right)\equiv\Omega\) with state space \(\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\). For simplicity, assume that \(\mathbb{T}\equiv\mathbb{N}\). Write \(\mu_{\mathbf{W}}\) [resp. \(\Sigma_{\mathbf{W}}^{2}\)] for the constant value of the mean [resp. variance-covariance] function of \(\mathbf{W}\). For any \(S,T\in\mathbb{N}\), write \(W_T\) [resp. \(W_{T+S}\) for the state of the process \(\mathbf{W}\) at the current time \(T\) [future time \(T+S\)] and write \(\hat{W}_{T+S\mid T}\) for the minimum square error predictor of the future state \(W_{T+S}\) of process, given the information \(\mathcal{F}_{T}\equiv\sigma\left(W_{1},\dots,W_{T}\right)\) generated by the process \(\mathbf{W}\) itself up to the time \(T\) included.

Proposition 9.9 (Future state predictor of a SWN) The time average estimator \(\bar{\mathbf{W}}_{T}\) is a point estimator for \(W_{T+S}\), for all \(S,T\in\mathbb{N}\).

Proof. Writing \(\mathbf{E}\left[\cdot\mid\mathcal{F}_{T}\right]\) for the conditional expectation operator given the information \(\mathcal{F}_{T}\), we know that \[\begin{equation} \hat{W}_{T+S\mid T}\overset{\text{def}}{=} \underset{Y\in L^{2}\left( \Omega_{\mathcal{F}_{T}};\mathbb{R}^{N}\right)}{\arg\min}\left\{\mathbf{E}\left[\left(W_{T+S}-Y\right)^{2}\right]\right\} =\mathbf{E}\left[W_{T+S}\mid\mathcal{F}_{T}\right], \tag{9.11} \end{equation}\] for all \(S,T\in\mathbb{N}\). Now, since the random variables in \(\mathbf{W}\) are independent and the mean function of \(\mathbf{W}\) is constant, we have \[\begin{equation} \mathbf{E}\left[W_{T+S}\mid\mathcal{F}_{T}\right]=\mathbf{E}\left[W_{T+S}\right]=\mu_{\mathbf{W}}, \tag{9.12} \end{equation}\] for all \(S,T\in\mathbb{N}\). Combining (9.11) and (9.12), we obtain \[\begin{equation} \hat{W}_{T+S\mid T}=\mu_{\mathbf{W}}, \tag{9.13} \end{equation}\] for all \(S,T\in\mathbb{N}\). On the other hand, \(\bar{\mathbf{W}}_{T}\) is a point estimator for \(\mu_{\mathbf{W}}\) and the desired claim follows.

Proposition 9.10 (Prediction intervals for future state of a SWN) In case \(N=1\), assume that \(\mathbf{W}\) is Gaussian, that is \(\mathbf{W}\sim GWN\left(\sigma_{\mathbf{W}}^{2}\right)\), for some \(\sigma_{\mathbf{W}}>0\). Then, a prediction interval for the state \(W_{T+S}\), for any \(S\in\mathbb{N}\), at the confidence level of \(100\left(1-\alpha\right)\%\), is given by \[\begin{equation} \left(\bar{\mathbf{W}}_{T}-t_{T-1,\alpha/2}S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}},\ \bar{\mathbf{W}}_{T}+t_{T-1,\alpha/2}S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}}\right), \tag{9.14} \end{equation}\] for any \(\alpha\in\left(0,1\right)\), where the positive number \(t_{T-1,\alpha/2}\) is the upper tail critical value of level \(\alpha/2\) of the Student random variable \(t_{T-1}\) with \(T-1\) degree of freedom and \(S_{\mathbf{W},T}\equiv\sqrt{S_{\mathbf{W},T}^{2}}\) is the time standard deviation of \(\mathbf{W}\). Hence, a realization of the prediction interval (9.14) is given by \[\begin{equation} \left(\bar{\mathbf{w}}_{T}-t_{T-1,\alpha/2}s_{\mathbf{W},T}\sqrt{1+\frac{1}{T}},\ \bar{\mathbf{w}}_{T}+t_{T-1,\alpha/2}s_{\mathbf{W},T}\sqrt{1+\frac{1}{T}}\right), \tag{9.15} \end{equation}\] where \(\bar{\mathbf{w}}_{T}\) [resp. \(s_{\mathbf{W},T}\)] is the realization of the time average estimator \(\bar{\mathbf{W}}_{T}\) [resp. time standard deviation \(S_{\mathbf{W},T}\)] of \(\mathbf{W}\) corresponding to a sample path \(\left(w_{t}\right)_{t=1}^{T}\equiv\mathbf{w}\) of \(\mathbf{W}\).

Proof. On account of Proposition 9.9 and the Gaussianity assumption on \(\mathbf{W}\), the statistic \[\begin{equation} \frac{W_{T+S}-\hat{W}_{T+S\mid T}}{S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}}} =\frac{W_{T+S}-\bar{\mathbf{W}}_{T}}{S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}}}\equiv X \end{equation}\] is a Student random variable \(t_{T-1}\) with \(T-1\) degrees of freedom. In fact, since \(\mathbf{W}\) is a Gaussian strong white noise, the random variable \[ \left(W_{T+S}-\bar{\mathbf{W}}_{T}\right)/\sigma_{\mathbf{W}}\sqrt{1+\frac{1}{T}}\equiv Z \] is Gaussian distributed. Moreover, we have \[\begin{equation} \mathbf{E}\left[Z\right]=\frac{\mathbf{E}\left[W_{T+S}\right]-\mathbf{E}\left[\bar{\mathbf{W}}_{T}\right]} {\sigma_{\mathbf{W}}\sqrt{1+\frac{1}{T}}}=0 \end{equation}\] and \[\begin{equation} \mathbf{D}^{2}\left[Z\right]=\frac{\mathbf{D}^{2}\left[W_{T+S}\right]+\mathbf{D}^{2}\left[\bar{\mathbf{W}}_{T}\right]} {\sigma_{\mathbf{W}}^{2}\left(1+\frac{1}{T}\right)}=\frac{\sigma_{\mathbf{W}}^{2}+\frac{1}{T}\sigma_{\mathbf{W}}^{2}} {\sigma_{\mathbf{W}}^{2}\left(1+\frac{1}{T}\right)}=1. \end{equation}\] That is \(Z\sim N\left(0,1\right)\). On the other hand, the random variable \(\left(T-1\right)S_{\mathbf{W},T}^{2}/\sigma_{\mathbf{W}}^{2}\equiv Y\) has the chi-square distribution \(\chi_{T-1}^{2}\) with \(T-1\) degrees of freedom. It follows that the statistic \[\begin{equation} \frac{Z}{\sqrt{Y/\left( T-1\right) }}\equiv\frac{\frac{W_{T+S}-\bar{\mathbf{W}}_{T}}{\sigma_{\mathbf{W}}\sqrt{1+\frac{1}{T}}}} {\sqrt{\frac{\left(T-1\right)S_{\mathbf{W},T}^{2}}{\sigma_{\mathbf{W}}^{2}}/\left(T-1\right)}}=X \end{equation}\] is a Student random variable \(t_{T-1}\) with \(T-1\) degrees of freedom. As a consequence, we can write \[\begin{equation} \mathbf{P}\left(\left\vert\frac{W_{T+S}-\hat{W}_{T+S\mid T}}{S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}}}\right\vert <t_{T-1,\alpha/2}\right)<1-\alpha, \end{equation}\] where the positive number \(t_{T-1,\alpha/2}\) is the upper tail critical value of level \(\alpha/2\) of the Student \(t_{T-1}\) distribution with \(T-1\) degree of freedom. Now, we have \[\begin{align} \left\vert \frac{W_{T+S}-\hat{W}_{T+S\mid T}}{S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}}}\right\vert & <t_{T-1,\alpha/2}\Leftrightarrow-t_{T-1,\alpha/2}<\frac{W_{T+S}-\hat{W}_{T+S\mid T}}{S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}}} <t_{T-1,\alpha/2}\\ & \Leftrightarrow\hat{W}_{T+S\mid T}-t_{T-1,\alpha/2}S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}} <W_{T+S}<\hat{W}_{T+S\mid T}+t_{T-1,\alpha/2}S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}}\\ & \Leftrightarrow\bar{\mathbf{W}}_{T}-t_{T-1,\alpha/2}S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}} <W_{T+S}<\bar{\mathbf{W}}_{T}+t_{T-1,\alpha/2}S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}} \end{align}\] Therefore, \[\begin{equation} \mathbf{P}\left(\bar{\mathbf{W}}_{T}-t_{T-1,\alpha/2}S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}} <W_{T+S}<\bar{\mathbf{W}}_{T}+t_{T-1,\alpha/2}S_{\mathbf{W},T}\sqrt{1+\frac{1}{T}}\right) <1-\alpha, \end{equation}\] which shows that (9.14) is a prediction interval for the state \(W_{T+S}\), for any \(S\in\mathbb{N}\), at the confidence level of \(100\left(1-\alpha\right)\%\), for any \(\alpha\in\left(0,1\right)\).

10 Weak White Noises

Let \(\left(W_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{W}\) be a stochastic process of order \(2\) on a probability space \(\left(\Omega,\mathcal{E},\mathbf{P}\right)\equiv\Omega\) with state space \(\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\).

Definition 10.1 (Weak white noise) We say that \(W\) is a weak white noise (WWN) if the random variables in \(\mathbf{W}\) have mean \(0\), the same variance-covariance matrix, and are uncorrelated.

To denote that \(\mathbf{W}\) is a weak white noise with state space \(\mathbb{R}^{N}\) we write \(\mathbf{W}\sim WWN^{N}\left(\Sigma_{\mathbf{W}}^{2}\right)\), where \(\Sigma_{\mathbf{W}}^{2}\) is the common variance-covariance matrix of the random variables in \(\mathbf{W}\). In case \(N=1\), the reference to the dimension \(N\) will be omitted and we write \(\mathbf{W}\sim WWN\left(\sigma_{\mathbf{W}}^{2}\right)\), where \(\sigma_{\mathbf{W}}^{2}\) is the common variance of the random variables in \(\mathbf{W}\).

From Definition 10.1, it immediately follows that a strong white noise is a weak white noise.

As mentioned above, weakening the order \(2\) prescription on strong white noises to order \(1\) no longer allows to consider strong white noises as a special case of weak white noises.

Let \(\mathbf{W}\sim WWN^{N}\left(\Sigma_{\mathbf{W}}^{2}\right)\). For simplicity assume that \(\mathbb{T}\equiv\mathbb{N}\).

Proposition 10.1 (WWN as a WSS process) The process \(\mathbf{W}\) is weak-sense stationary.

Proposition 10.2 (WWN as an ergodic process in the mean) The process \(\mathbf{W}\) is mean-square ergodic in the mean.

Remark (Gaussian WWN). If \(\mathbf{W}\) is a Gaussian process, then \(\mathbf{W}\) is a Gaussian (strong) white noise.

Example 10.1 (A WWN which is not not a SWN) Let \(\left(X_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{X}\sim GWN(\sigma_{\mathbf{X}}^{2})\). For simplicity, assume that \(\sigma_{\mathbf{X}}^{2}=1\). Fixed any \(p\in\mathbb{N}\) consider the process \(\left(W_{t}\right)_{t\in\mathbb{N}}\equiv \mathbf{W}\) given by \[\begin{equation} W_{t}\overset{\text{def}}{=}X_{t}\cdots X_{t+p},\quad\forall t\in\mathbb{N}. \end{equation}\] Then the process \(\mathbf{W}\) is a weak white noises which is a SSS process, but, in general, is not a SWN.

10.1 Parameter Estimation

Let \(\mathbf{W}\sim WWN\left(\sigma_{\mathbf{W}}^{2}\right)\) and let \(\bar{\mathbf{W}}_{T}\) [resp. \(S_{\mathbf{W},T}^{2}\)] be the time average [resp. time variance] estimator of size \(T\) of \(\mathbf{W}\) (see Definitions 8.1 and 8.2).

Proposition 10.3 (Time average of a WWN) The statistic \(\bar{\mathbf{W}}_{T}\) is an unbiased estimator of \(\mu_{\mathbf{W}}=0\) with mean squared error given by \(Var\left(\bar{\mathbf{W}}_{T}\right)\).

10.2 Prediction of Future States and Prediction Intervals

11 Random Walks

Let \(\left(X_{t}\right)_{t\in\mathbb{N}_{0}}\equiv\mathbf{X}\) be a stochastic process on a probability space \(\Omega\) with states in \(\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\). We assume that the random variable \(X_{0}\) has finite moment of order \(2\). We set \(\mu_{X_{0}}\equiv\mathbf{E}\left[X_{0}\right]\) and \(\Sigma_{X_{0}}\equiv Var\left(X_{0}\right)\).

Definition 11.1 (Random walk) We say that \(\mathbf{X}\) is a random walk if there exists a strong white noise \(\left(W_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{W}\) on \(\Omega\) with states in \(\mathbb{R}^{N}\) and variance-covariance matrix \(\Sigma_{\mathbf{W}}^{2}\), for some definite positive symmetric matrix \(\Sigma_{\mathbf{W}}^{2}\), such that \(X_{0}\) is independent of the random variables in \(\mathbf{W}\) and we have \[\begin{equation} X_{t}=X_{t-1}+W_{t}, \tag{11.1} \end{equation}\] for every \(t\in\mathbb{N}\).

For several applications it is also convenient to introduce the following generalization.

Definition 11.2 (Random walk with drift and linear trend) We say that \(\mathbf{X}\) is a random walk with drift and linear trend if there exist \(\alpha,\beta\in\mathbb{R}^{N}\) and a strong white noise \(\left(W_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{W}\) on \(\Omega\) with states in \(\mathbb{R}^{N}\) and variance-covariance matrix \(\Sigma_{\mathbf{W}}^{2}\), for some definite positive symmetric matrix \(\Sigma_{\mathbf{W}}^{2}\), such that \(X_{0}\) is independent of the random variables in \(\mathbf{W}\) and we have \[\begin{equation} X_{t}=\alpha + \beta t + X_{t-1} + W_{t}, \tag{11.2} \end{equation}\] for every \(t\in\mathbb{N}\).

The random variable [resp. distribution of the random variable] \(X_{0}\) is referred to as the initial state [resp. initial distribution] of the random walk \(\mathbf{X}\). In case \(X_{0}\) is a Dirac random variable concentrated at \(x_{0}\), for some \(x_{0}\in\mathbb{R}^{N}\), we call \(x_{0}\) the starting point of the random walk. The strong white noise \(\mathbf{W}\) is referred to as the state innovation of the process \(\mathbf{X}\). However, the explicit reference to the innovation process \(\mathbf{W}\) is often not mentioned if not necessary. When dealing with a random walk with drift and linear trend the vector \(\alpha\), [resp. \(\beta\)] is referred to as the drift [resp. linear trend coefficient] of \(\mathbf{X}\). When we want to stress that \(\alpha\neq0\) and \(\beta=0\) [resp. \(\alpha=0\) and \(\beta\neq0\)] we call \(\mathbf{X}\) a random walk with drift and no linear trend [resp. with linear trend and no drift].

To denote that \(\mathbf{X}\) is a random walk with states in \(\mathbb{R}^{N}\) we write \(\mathbf{X}\sim RW^{N}\). In case \(N=1\), we neglect \(N\) and speak of real random walk.

Proposition 11.1 (Random walk representation) Assume that \(\mathbf{X}\) satisfies Equation (11.1). Then We have \[\begin{equation} X_{t}=X_{0}+\sum\limits_{s=1}^{t}W_{s}, \tag{11.3} \end{equation}\] for every \(t\in\mathbb{N}\). More generally, \[\begin{equation} X_{t}=X_{s}+\sum\limits_{r=s+1}^{t}W_{r}, \tag{11.4} \end{equation}\] for all \(s,t\in\mathbb{N}_{0}\) such that \(0\leq s<t\).

Proposition 11.2 (Random walk with drift and linear trend representation) Assume that \(\mathbf{X}\) satisfies Equation (11.2). Then we have \[\begin{equation} X_{t}=X_{0}+\alpha t+\frac{1}{2}\beta t\left(t+1\right)+\sum\limits_{s=1}^{t}W_{s}, \tag{11.5} \end{equation}\] for every \(t\in\mathbb{N}\). More generally, \[\begin{equation} X_{t}=X_{s}+\alpha\left(t-s\right)+\frac{1}{2}\beta\left(t-s\right)\left(t+s+1\right)+\sum\limits_{r=s+1}^{t}W_{r}, \tag{11.6} \end{equation}\] for all \(s,t\in\mathbb{N}_{0}\) such that \(0\leq s<t\).

Unless otherwise specified, in what follows we deal with random walks with drift and linear trend satisfying Equation (11.2).

Let \(\mathbf{X}\sim RW^{N}\) and let \(\left(\mathcal{F}_{t}^{X_{0},\mathbf{W}}\right)_{t\in\mathbb{N}_{0}}\equiv\mathfrak{F}^{X_{0},\mathbf{W}}\) be the filtration generated by the initial state \(X_{0}\) of \(\mathbf{X}\) and the innovation process \(\mathbf{W}\), that is \[\begin{equation} \mathcal{F}_{0}^{X_{0},\mathbf{W}} \overset{\text{def}}{=} \sigma\left(X_{0}\right) \quad\text{and}\quad \mathcal{F}_{t}^{X_{0},\mathbf{W}} \overset{\text{def}}{=} \sigma\left(X_{0},W_{1},\dots,W_{t}\right), \quad\forall t\in\mathbb{N}, \end{equation}\] where \(\sigma\left(X,Y,Z,\dots\right)\) denotes the \(\sigma\)-algebra generated by the random variables \(X,Y,Z,\dots\)

Proposition 11.3 (Random walk as adapted process) The random walk \(\mathbf{X}\) is adapted to \(\mathfrak{F}^{X_{0},\mathbf{W}}\).

Note that when \(X_{0}\equiv x_{0}\), the random walk \(\mathbf{X}\) is adapted to the filtration \(\left(\mathcal{F}_{t}^{\mathbf{W}}\right)_{t\in\mathbb{T}}\equiv\mathfrak{F}^{\mathbf{W}}\) generated by the innovation process \(\mathbf{W}\).

Proposition 11.4 (Order of a random walk) If the state innovation \(\mathbf{W}\) is a process of order \(K\), then the random walk \(\mathbf{X}\) is also a process of order \(K\), for every \(K\geq2\).

Proposition 11.5 (Independence of a random walk from future state innovations) The random variables \(X_{1},\dots,X_{t}\) in the random walk \(\mathbf{X}\) are independent of the random variables \(W_{t+1},W_{t+2}\dots\) in the state innovation process \(\mathbf{W}\), for every \(t\in\mathbb{N}\).

Proposition 11.6 (Markov property of a random walk) The random walk \(\mathbf{X}\) is a Markov process, that is \[\begin{equation} \mathbf{P}\left(X_{t}\in B\mid\mathcal{F}_{s}^{X_{0},\mathbf{W}}\right) =\mathbf{P}\left(X_{t}\in B\mid\sigma\left(X_{s}\right)\right), \tag{11.7} \end{equation}\] for every \(B\in\mathcal{B}\left(\mathbb{R}^{N}\right)\) and all \(s,t\in\mathbb{N}_{0}\) such that \(0\leq s<t\).

Proposition 11.7 (Martingale property of a random walk with no drift and no linear trend) Assume that \(\mathbf{X}\) has no drift and no linear trend. Then \(\mathbf{X}\) is a martingale, that is \[\begin{equation} \mathbf{E}\left[X_{t}\mid\mathcal{F}_{s}^{X_{0},W}\right] =X_{s}, \tag{11.8} \end{equation}\] for all \(s,t\in\mathbb{N}_{0}\) such that \(0\leq s<t\).

Proposition 11.8 (Mean function of a random walk) The mean function \(\mu_{\mathbf{X}}:\mathbb{N}_{0}\rightarrow\mathbb{R}^{N}\) of \(\mathbf{X}\) is given by \[\begin{equation} \mu_{\mathbf{X}}\left(t\right)=\mu_{X_{0}}+\alpha t+\frac{1}{2}\beta t\left(t+1\right), \tag{11.9} \end{equation}\] for every \(t\in\mathbb{N}_{0}\).

Proposition 11.9 (Mean stationarity of a random walk) Assume that \(\mathbf{X}\) has no drift and no linear trend. Then \(\mathbf{X}\) is a mean stationary process, that is \[\begin{equation} \mu_{\mathbf{X}}\left(t\right)=\mu_{X_{0}}, \tag{11.10} \end{equation}\] for every \(t\in\mathbb{N}_{0}\).

Proposition 11.10 (Variance-covariance function of a random walk) The variance-covariance function \(\Sigma_{\mathbf{X}}^{2}:\mathbb{N}_{0}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) of \(\mathbf{X}\) is given by \[\begin{equation} \Sigma_{\mathbf{X}}^{2}\left(t\right)=\Sigma_{X_{0}}+t\Sigma_{\mathbf{W}}^{2}, \tag{11.11} \end{equation}\] for every \(t\in\mathbb{N}_{0}\).

Proposition 11.11 (Autocovariance function of a random walk) The autocovariance function \(\gamma_{\mathbf{X}}:\mathbb{N}_{0}\times\mathbb{N}_{0}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) of \(\mathbf{X}\) is given by \[\begin{equation} \gamma_{\mathbf{X}}\left(s,t\right)=\Sigma_{X_{0}}+s\Sigma_{\mathbf{W}}^{2}, \tag{11.12} \end{equation}\] for all \(s,t\in\mathbb{N}_{0}\) such that \(s\leq t\).

Proposition 11.12 (Autocorrelation function of a random walk) The autocorrelation function \(\rho_{\mathbf{X}}:\mathbb{N}_{0}\times\mathbb{N}_{0}\rightarrow\mathbb{R}^{N}\times\mathbb{R}^{N}\) of \(\mathbf{X}\) is given by \[\begin{equation} \rho_{\mathbf{X}}\left(s,t\right)=\left\{ \begin{array} [c]{ll} \operatorname*{diag}\left(\Sigma_{X_{0}}\right)^{-1/2}\Sigma_{X_{0}} \operatorname*{diag}\left(\Sigma_{X_{0}}\right)^{-1/2}, & \text{if } 0=s=t,\\ \sqrt{\frac{s}{t}}\operatorname*{diag}\left(\frac{1}{s}\Sigma_{X_{0}} +\Sigma_{\mathbf{W}}^{2}\right)^{-1/2}\left(\frac{1}{s}\Sigma_{X_{0}} +\Sigma_{\mathbf{W}}^{2}\right)\operatorname*{diag}\left(\frac{1}{t}\Sigma_{X_{0}} +\Sigma_{\mathbf{W}}^{2}\right)^{-1/2}, & \text{if } 0<s\leq t. \end{array} \right. \tag{11.13} \end{equation}\] In particular, if \(X_{0}\equiv x_{0}\in\mathbb{R}^{N}\) we have \[\begin{equation} \rho_{\mathbf{X}}\left(s,t\right)=\left\{ \begin{array} [c]{ll} 0, & \text{if }0=s=t,\\ \sqrt{\frac{s}{t}}\operatorname*{diag}\left(\Sigma_{\mathbf{W}}^{2}\right)^{-1/2}\left(\Sigma_{\mathbf{W}}^{2}\right) \operatorname*{diag}\left(\Sigma_{\mathbf{W}}^{2}\right)^{-1/2}, & \text{if } 0<s\leq t. \end{array} \right. \tag{11.14} \end{equation}\]

Corollary 11.1 (Autocorrelation function of a real random walk) In case \(N=1\), we have \[\begin{equation} \rho_{\mathbf{X}}\left(s,t\right)=\left\{ \begin{array} [c]{ll} 1, & \text{if }0=s=t,\\ \sqrt{\frac{s}{t}\frac{\left(\frac{1}{s}\sigma_{X_{0}}^{2}+\sigma_{\mathbf{W}}^{2}\right)} {\left(\frac{1}{t}\sigma_{X_{0}}^{2}+\sigma_{\mathbf{W}}^{2}\right)}}, & \text{if }0<s\leq t. \end{array} \right. \tag{11.15} \end{equation}\] In particular, if \(X_{0}\equiv x_{0}\in\mathbb{R}\) we have \[\begin{equation} \rho_{\mathbf{X}}\left(s,t\right)=\left\{ \begin{array} [c]{ll} 0, & \text{if }0=s=t,\\ \sqrt{\frac{s}{t}}, & \text{if }0<s\leq t. \end{array} \right. \tag{11.16} \end{equation}\]

Proposition 11.13 (Gaussianity of a random walk) Assume that \(X_{0}\) is Gaussian, possibly degenerate. In addition, assume that the innovation process \(\mathbf{W}\) is Gaussian. Then we have \[\begin{equation} X_{t}\sim N\left(\mu_{\mathbf{X}}\left(t\right),\Sigma_{\mathbf{X}}^{2}\left(t\right)\right), \tag{11.17} \end{equation}\] for every \(t\in\mathbb{N}\), where \(\mu_{\mathbf{X}}\left(t\right)\) and \(\Sigma_{\mathbf{X}}^{2}\left(t\right)\) are given by (11.9) and (11.11), respectively. In addition, the process \(\mathbf{X}\) is Gaussian.

Definition 11.3 (Gaussianity of a random walk) In light of Proposition 11.13, we call Gaussian a random walk with Gaussian innovation \(\mathbf{W}\).

Despite not stationary, not even weakly stationary, a simple transformation can make a random walk an almost stationary process.

Definition 11.4 (Differenced random walk) We call differenced random walk the process \(\left(\Delta X_{t}\right)_{t\in\mathbb{N}}\equiv\Delta\mathbf{X}\) given by \[\begin{equation} \Delta X_{t}\overset{\text{def}}{=}X_{t}-X_{t-1},\quad\forall t\in\mathbb{N} \tag{11.18} \end{equation}\]

Proposition 11.14 (Differenced random walk) The random variables in the differenced random walk \(\Delta\mathbf{X}\) satisy the equation \[\begin{equation} \Delta X_{t}= \alpha+\beta t+W_{t}, \tag{11.19} \end{equation}\] for every \(t\in\mathbb{N}\)

Proof. Replacing \(s\) with \(t-1\) into Equation (11.6), we can write \[\begin{align} X_{t} & = X_{t-1}+\alpha\left(t-\left(t-1\right)\right) +\frac{1}{2}\beta\left(t-\left(t-1\right)\right)\left(t+\left(t-1\right)+1\right) +\sum\limits_{r=\left(t-1\right)+1}^{t}W_{r}\\ & = X_{t-1}+\alpha+\beta t+W_{t}, \end{align}\] for every \(t\in\mathbb{N}\). The desired (11.19) immediately follows.

Remark (**Differenced random walk**). The differenced randon walk is a strong white noise with drift and linear trend. In particular, if \(\mathbf{X}\) has either no drift or no linear trend, \(\Delta\mathbf{X}\) is a strong white noise with either no drift or no linear trend.

11.1 Parameter Estimation

Let \(\left(X_{t}\right)_{t\in\mathbb{N}_{0}}\equiv\mathbf{X}\) be a random walk on a probability space \(\Omega\) with states in \(\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\), which satisfies Equation (11.2) for a state innovation \(\mathbf{W}\sim SWN^{N}\left(\Sigma^{2}_{\mathbf{W}}\right)\) and some \(\alpha,\beta\in \mathbb{R}^{N}\). We want to obtain estimates for the parameters \(\Sigma^{2}_{\mathbf{W}}\), \(\alpha\), and \(\beta\). We know that the random walk \(\mathbf{X}\) is not stationary, not even weakly. Therefore, \(\mathbf{X}\) is not ergodic. This prevents the direct use of the method of moments. However, we know that the differenced random walk, \(\Delta\mathbf{X}\) turns out to be a strong white noise with drift and linear trend. Therefore, we can apply the procedures in Section 10.1

Proposition 11.15 (Point Estimator and confidence intervals for Gaussian random walks) In case \(N=1\), assume that \(\mathbf{X}\) satisfies Equation (11.1) with Gaussian state innnovation \(\mathbf{W}\), in symbols \(\mathbf{W}\sim GWN\left(\sigma_{\mathbf{W}}^{2}\right)\), for some \(\sigma_{\mathbf{W}}>0\). Then, an estimator for \(\sigma_{\mathbf{W}}^{2}\) and the corresponding confidence intervals can be obtained via the application of Proposition 11.14 and the corresponding estimator and confidence intervals for \(\mathbf{W}\) (see Propositions 9.6 and 9.8).

11.2 Prediction of Future States and Prediction Intervals

Let \(\left(X_{t}\right)_{t\in\mathbb{N}_{0}}\equiv\mathbf{X}\) be a random walk on a probability space \(\Omega\) with states in \(\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\), which satisfies Equation (11.2) for a state innovation \(\mathbf{W}\sim SWN^{N}\left(\Sigma^{2}_{\mathbf{W}}\right)\) and some \(\alpha,\beta\in \mathbb{R}^{N}\). For any \(S,T\in\mathbb{N}\), write \(X_{T+S}\) for the \(S\)th future state of the process \(\mathbf{X}\) with respect to the current state \(X_{T}\) and write \(\hat{X}_{T+S\mid T}\) for the minimum square error predictor of the \(S\)th future state of the process \(\mathbf{X}\), given the information \(\mathcal{F}_{T}^{X_{0},\mathbf{W}}\equiv\sigma\left(X_{0},W_{1},\dots ,W_{T}\right)\) generated by the process \(\mathbf{X}\) itself up to the time \(T\) included.

Proposition 11.16 (Future state predictor of a random walk) We have \[\begin{equation} \hat{X}_{T+S\mid T}=\alpha S+\frac{1}{2}\beta S(T+2S+1)+X_{T}, \tag{11.20} \end{equation}\] for every \(S\in \mathbb{N}\). In particular, if \(\alpha=\beta=0\), we have \[\begin{equation} \hat{X}_{T+S\mid T}=X_{T}. \tag{11.21} \end{equation}\]

Proof. We know that \[\begin{equation} \hat{X}_{T+S\mid T}=\mathbf{E}\left[ X_{T+S}\mid \mathcal{F}_{T}^{X_{0}, \mathbf{W}}\right]. \end{equation}\] On the other hand, thanks to Equation (11.6), where \(t\equiv T+S\) and \(s\equiv T\), we can write \[\begin{equation} X_{T+S}=X_{T}+\alpha S+\frac{1}{2}\beta S\left(2T+S+1\right)+\sum_{t=T+1}^{T+S}W_{t}, \tag{11.22} \end{equation}\] for all \(S,T\in\mathbb{N}\). Therefore, by virtue of the linearity of the conditional expectation operator, we obtain \[\begin{eqnarray} \hat{X}_{T+S\mid T} &=&\mathbf{E}\left[X_{T}+\alpha S+\frac{1}{2}\beta S\left(2T+S+1\right)+\sum_{t=T+1}^{T+S}W_{t} \mid \mathcal{F}_{T}^{X_{0},\mathbf{W}}\right]\\ &=&\mathbf{E}\left[ X_{T}\mid \mathcal{F}_{T}^{X_{0},\mathbf{W}}\right] +\mathbf{E}\left[\alpha S+\frac{1}{2}\beta S\left(2T+S+1\right) \mid\mathcal{F}_{T}^{X_{0},\mathbf{W}}\right]+\sum_{t=T+1}^{T+S}\mathbf{E}\left[W_{t}\mid \mathcal{F}_{T}^{X_{0},\mathbf{W}}\right]. \tag{11.23} \end{eqnarray}\] Now, \(X_{T}\) is observable with respect to the information represented by \(\mathcal{F}_{T}^{X_{0},\mathbf{W}}\) and \(\alpha S+\frac{1}{2}\beta S\left(2T+S+1\right)\in\mathbb{R}^{N}\). Hence, \[\begin{equation} \mathbf{E}\left[ X_{T}\mid \mathcal{F}_{T}^{X_{0},\mathbf{W}}\right]=X_{T} \quad\text{and}\quad \mathbf{E}\left[\alpha S+\frac{1}{2}\beta S\left(2T+S+1\right)\mid\mathcal{F}_{T}^{X_{0},\mathbf{W}}\right] =\alpha S+\frac{1}{2}\beta S\left(2T+S+1\right). \tag{11.24} \end{equation}\] for all \(S,T\in \mathbb{N}\). In addition, the variables \(W_{T+1},\dots,W_{T+S}\) are independent of the the information represented by \(\mathcal{F}_{T}^{X_{0},\mathbf{W}}\). It follows \[\begin{equation} \mathbf{E}\left[W_{t}\mid\mathcal{F}_{T}^{X_{0},\mathbf{W}}\right]=\mathbf{E}\left[W_{t}\right]=0, \tag{11.25} \end{equation}\] for \(t=T+1,\dots,T+S\). Combining (11.23)-(11.25), the desired (11.20) follows.

Proposition 11.17 (Prediction intervals for Gaussian random walks) In case \(N=1\), assume that \(\mathbf{X}\) satisfies Equation (11.2) and \(\mathbf{W}\) is Gaussian, that is \(\mathbf{W}\sim GWN\left(\sigma_{\mathbf{W}}^{2}\right)\), for some \(\sigma_{\mathbf{W}}>0\). Then, a prediction interval for the state \(X_{T+S}\) at the confidence level of \(100\left(1-\alpha\right)\%\) is given by \[ \left(X_{T}+\alpha S+\frac{1}{2}\beta S\left(2T+S+1\right)-z_{\alpha/2}\sigma_{\mathbf{W}}\sqrt{S},\ X_{T}+\alpha S+\frac{1}{2}\beta S\left(2T+S+1\right)+z_{\alpha/2}\sigma_{\mathbf{W}}\sqrt{S}\right), \tag{11.26} \] for any \(S\in\mathbb{N}\) and any \(\alpha\in\left(0,1\right)\), where \(z_{\alpha/2}\) is the upper tail critical value of level \(\alpha/2\) of the standard Gaussian distribution. In particular, if \(\alpha=\beta=0\), the prediction interval becomes \[\begin{equation} \left(X_{T}-z_{\alpha/2}\sigma_{\mathbf{W}}\sqrt{S},\ X_{T}+z_{\alpha/2}\sigma_{\mathbf{W}}\sqrt{S}\right). \tag{11.27} \end{equation}\]

Proof. By virtue of Equation (11.20) and (11.22), we have \[\begin{equation} X_{T+S}-\hat{X}_{T+S\mid T}=\sum_{t=T+1}^{T+S}W_{t}. \end{equation}\] As a consequence, \[ \mathbf{D}^{2}\left[X_{T+S}-\hat{X}_{T+S\mid T}\right] =\mathbf{D}^{2}\left[\sum_{t=T+1}^{T+S}W_{t}] =\sum_{t=T+1}^{T+S}\mathbf{D}^{2}\left[W_{t}\right] =\sigma_{\mathbf{W}}^{2}S \] From this and the Gaussianity assumption on \(\mathbf{W}\), it follows that the random variable \(X_{T+S}-\hat{X}_{T+S\mid T}\) is Gaussian distributed with mean zero and variance \(\sigma_{\mathbf{W}}^{2}S\). Then, We can write \[\begin{equation} \mathbf{P}\left(\left\vert\frac{X_{T+S}-\hat{X}_{T+S\mid T}}{\sigma_{\mathbf{W}}\sqrt{S}}\right\vert < z_{\alpha/2}\right) < \alpha -1, \end{equation}\] where \(z_{\alpha/2}\) is the upper tail critical value of level \(\alpha/2\) of the standard Gaussian distribution. On the nother hand, still considering Equation (11.20), we obtain \[\begin{eqnarray} &&\left\vert\frac{X_{T+S}-\hat{X}_{T+S\mid T}}{\sigma_{\mathbf{W}}\sqrt{S}}\right\vert<z_{\alpha/2}\\ \Leftrightarrow -z_{\alpha/2}<\frac{X_{T+S}-\hat{X}_{T+S\mid T}}{\sigma_{\mathbf{W}}\sqrt{S}}<z_{\alpha/2}\\ &&\Leftrightarrow \hat{X}_{T+S\mid T}-z_{\alpha/2}\sigma_{\mathbf{W}}\sqrt{S}<X_{T+S} <\hat{X}_{T+S\mid T}+z_{\alpha/2}\sigma_{\mathbf{W}}\sqrt{S}\\ &&\Leftrightarrow X_{T}+\alpha S+\frac{1}{2}\beta S\left(2T+S+1\right)-z_{\alpha/2}\sigma_{\mathbf{W}}\sqrt{S} <X_{T+S}<X_{T}+\alpha S+\frac{1}{2}\beta S\left(2T+S+1\right)+z_{\alpha/2}\sigma_{\mathbf{W}}\sqrt{S}, \end{eqnarray}\] as desired.

Under the assumptions of Proposition \@ref(prp:GRW-pred-int-prp), a realization of the prediction interval for the state $X_{T+S}$ at the confidence level of $100\left(1-\alpha\right)\%$ is given by
\[
\left(x_{T}+\alpha S+\frac{1}{2}\beta S\left(2T+S+1\right)-z_{\alpha/2}\hat{\sigma}_{\mathbf{W}}\sqrt{S},\
x_{T}+\alpha S+\frac{1}{2}\beta S\left(2T+S+1\right)+z_{\alpha/2}\hat{\sigma}_{\mathbf{W}}\sqrt{S}\right),
(\#eq:GRW-pred-int-cor-01-eq)
\]
for any $S\in\mathbb{N}$ and any $\alpha\in\left(0,1\right)$, where $x_{T}$ is the realization of the state $X_{T}$ and $\hat{\sigma}_{\mathbf{W}}$ is the point estimate of $\sigma_{\mathbf{W}}$. In particular, if $\alpha=\beta=0$, the realization of the prediction interval becomes
\begin{equation}
\left(x_{T}-z_{\alpha/2}\hat{\sigma}_{\mathbf{W}}\sqrt{S},\ x_{T}+z_{\alpha/2}\hat{\sigma}_{\mathbf{W}}\sqrt{S}\right).
(\#eq:GRW-pred-int-cor-02-eq)
\end{equation}

12 Autoregressive (AR) Processes

12.1 Autoregressive Processes of Order \(1\) - AR(1) Processes

Let \(\left(X_{t}\right)_{t\in\mathbb{N}_{0}}\equiv\mathbf{X}\) be a stochastic process on a probability space \(\Omega\) with states in \(\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\). We assume that the random variable \(X_{0}\) has finite moment of order \(2\). We set \(\mathbf{E}\left[X_{0}\right]\equiv\mu_{X_{0}}\) and \(Var\left(X_{0}\right)\equiv\Sigma_{X_{0}}\).

Definition 12.1 (AR(1) process) We say that \(\mathbf{X}\) is an autoregressive process of order \(1\) if there exist \(\phi\in\mathbb{R}^{N}\times\mathbb{R}^{N}\) and a strong white noise \(\left(W_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{W}\) on \(\Omega\) with states in \(\mathbb{R}^{N}\) and variance-covariance matrix \(\Sigma_{\mathbf{W}}^{2}\), for some definite positive symmetric matrix \(\Sigma_{\mathbf{W}}^{2}\), such that \(X_{0}\) is independent of the random variables in \(\mathbf{W}\) and the random variables in \(\mathbf{X}\) satisfy the equation \[\begin{equation} X_{t}=\phi X_{t-1}+W_{t}, \tag{12.1} \end{equation}\] for every \(t\in\mathbb{N}\).

For several applications it is also convenient to introduce the following generalization.

Definition 12.2 (AR(1) process with drift and linear trend) We say that \(\mathbf{X}\) is an autoregressive process of order \(1\) with drift and linear trend if there there exist \(\alpha,\beta\in\mathbb{R}^{N}\), \(\phi\in\mathbb{R}^{N}\times\mathbb{R}^{N}\), and a strong white noise \(\left(W_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{W}\) on \(\Omega\) with states in \(\mathbb{R}^{N}\) and variance-covariance matrix \(\Sigma_{\mathbf{W}}^{2}\), for some definite positive symmetric matrix \(\Sigma_{\mathbf{W}}^{2}\), such that \(X_{0}\) is independent of \(\mathbf{W}\) and we have \[\begin{equation} X_{t}=\alpha + \beta t + \phi X_{t-1} + W_{t}, \tag{12.2} \end{equation}\] for every \(t\in\mathbb{N}\).

The random vector [resp. the distribution of the random vector] \(X_{0}\) is referred to as the initial state [resp. initial distribution] of the autoregressive process \(\mathbf{X}\), in case \(X_{0}\equiv x_{0}\in\mathbb{R}^{N}\), we also call \(x_{0}\) the starting point of the autoregressive process; the matrix \(\phi\) is referred to as the regression coefficient of \(\mathbf{X}\). the vector \(\alpha\) [resp. \(\beta\)] is referred to as the drift, [resp. linear trend coefficient] of \(\mathbf{X}\); the strong white noise \(\mathbf{W}\) is referred to as the state innovation of the autoregressive process \(\mathbf{X}\). When we want to stress that \(\alpha\neq0\) and \(\beta=0\) [resp. \(\alpha=0\) and \(\beta\neq0\)] we call \(\mathbf{X}\) an autoregressive process of order \(1\) with drift and no linear trend [resp. autoregressive process of order \(1\) with linear trend and no drift]. In case \(N=1\), we usually speak of real autoregressive process of order \(1\) neglecting to mention \(N\). Also the explicit reference to the state innovation \(\mathbf{W}\) is often neglected when not necessary. Note that when \(\phi\equiv I_{N}\), where \(I_{N}\) is the identity matrix in \(\mathbb{R}^{N}\times\mathbb{R}^{N}\), Equation (12.1) becomes the random walk equation.

To denote that \(\mathbf{X}\) is \(N\)-variate real autoregressive process of order \(1\) we write \(\mathbf{X}\sim AR(1)^{N}\). In case \(N=1\), we neglect \(N\).

In some circumstances, it is more appropriate to consider an autoregressive processes \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{X}\) and a state innovations \(\left(W_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{W}\) with time set \(\mathbb{T}\equiv\mathbb{Z}\). In this case, we require that the random variables in \(\mathbf{X}\) satisy Equation (12.1) for every \(t\in\mathbb{Z}\). Clearly, in this case, there is no mention to the initial state of the process.

Unless otherwise specified, in what follows we deal with \(AR(1)\) processes \(\mathbf{X}\) with drift and linear trend, satisfying Equation (12.2), for which the state innovation \(\mathbf{W}\) is a real strong white noise with variance \(\sigma_{\mathbf{W}}^{2}\), for some \(\sigma_{\mathbf{W}}>0\), and the coefficients \(\alpha\), \(\beta\). and \(\phi\) are all real numbers. Recall that in this case the autocovariance and autocorrelation functions of \(\mathbf{X}\) are symmetric, that is \[\begin{equation} \gamma_{\mathbf{X}}\left(s,t\right)=\gamma_{\mathbf{X}}\left(t,s\right) \quad\text{and}\quad \rho_{\mathbf{X}}\left(s,t\right)=\rho_{\mathbf{X}}\left(t,s\right) \end{equation}\] for all \(s,t\in\mathbb{N}\). Furthermore, we will consider the technical assumption \(\phi\neq1\) to distinguish an autoregressive process of order \(1\) from a random walk.

The following result states that we can think of an \(AR(1)\) process with drift and linear trend \(\mathbf{X}\) as an \(AR(1)\) noise with the same regression coefficient and suitable drift and linear trend coefficient. This is rather useful for parameter estimatation.

Proposition 12.1 (State Innovation of AR(1) Processes) Assume that \(\left(X_{t}\right)_{t\in\mathbb{N}_{0}}\equiv\mathbf{X}\) is an \(AR(1)\) process satisfying Equation (11.2), for some \(\phi,\alpha,\beta\in\mathbb{R}\) and some state innovation \(\left(W_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{W}\). Then there exist \(\tilde{\alpha},\tilde{\beta}\in\mathbb{R}\) such that we can write \[\begin{equation} X_{t}=\tilde{\alpha}+\tilde{\beta}t+Y_{t}, \tag{12.3} \end{equation}\] for every \(t\in\mathbb{N}\), where \(\left(Y_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{Y}\) is an \(AR(1)\) process with state innovation \(\mathbf{W}\) which satisfies \[\begin{equation} Y_{t}=\phi Y_{t-1}+W_{t}, \tag{12.4} \end{equation}\] for every \(t\in\mathbb{N}\). Conversely, assume that \(\mathbf{X}\) is a process satisfying Equation (12.4), for some \(\tilde{\alpha},\tilde{\beta}\in\mathbb{R}\), where \(\mathbf{Y}\) is an \(AR(1)\) process with innovation \(\mathbf{W}\) solution of (12.3), for some \(\phi\in\mathbb{R}\). Then \(\mathbf{X}\) is an \(AR(1)\) process with innovation \(\mathbf{W}\) satisfying Equation (11.2), for suitable \(\alpha,\beta\in\mathbb{R}\).

Proof. Set \[\begin{equation} Y_{t}\overset{\text{def}}{=}-\frac{\alpha\left(1-\phi\right)-\beta\phi}{\left(1-\phi\right)^{2}} -\frac{\beta}{1-\phi}t+X_{t},\quad\forall t\in\mathbb{N}. \tag{12.5} \end{equation}\] Since \(X_{t}\) satisfies Equation (11.2), from (12.5), we obtain \[\begin{align} Y_{t} & =-\frac{\alpha\left(1-\phi\right)-\beta\phi}{\left(1-\phi\right)^{2}} -\frac{\beta}{1-\phi}t+\alpha+\beta t+\phi X_{t-1}+W_{t}\\ & =\frac{\alpha\left(1-\phi\right)^{2}-\alpha\left(1-\phi\right) +\beta\phi}{\left(1-\phi\right)^{2}}+\frac{\beta\left(1-\phi\right) -\beta}{1-\phi}t+\phi X_{t-1}+W_{t}\\ & =-\phi\frac{\alpha\left(1-\phi\right)-\beta}{\left(1-\phi\right)^{2}} -\phi\frac{\beta}{1-\phi}t+\phi X_{t-1}+W_{t}\\ & =-\phi\frac{\alpha\left(1-\phi\right)-\beta}{\left(1-\phi\right)^{2}} -\phi\frac{\beta}{1-\phi}-\phi\frac{\beta}{1-\phi}\left(t-1\right)+\phi X_{t-1}+W_{t}\\ & =-\phi\frac{\alpha\left(1-\phi\right)-\beta\phi}{\left(1-\phi\right)^{2}} -\phi\frac{\beta}{1-\phi}\left(t-1\right)+\phi X_{t-1}+W_{t}\\ & =\phi Y_{t-1}+W_{t} \end{align}\] for every \(t\in\mathbb{N}\). Therefore, \(Y_{t}\) solves Equation (12.4). Moreover, writing \[\begin{equation} \tilde{\alpha}\equiv\frac{\alpha\left(1-\phi\right)-\beta\phi}{\left(1-\phi\right)^{2}} \qquad\text{and}\qquad\tilde{\beta}\equiv\frac{\beta}{1-\phi} \tag{12.6} \end{equation}\] also Equation (12.5) is trivially satisfied. Conversely, assume that \(X\) is a process satisfying Equation (12.5), for some \(\tilde{\alpha},\tilde{\beta}\in\mathbb{R}\), then we have \[\begin{equation} X_{t-1}=\tilde{\alpha}+\tilde{\beta}\left(t-1\right)+Y_{t-1}, \tag{12.7} \end{equation}\] for every \(n\in\mathbb{N}\). Hence, combining (12.5) with (12.6) and (12.7), it follows \[\begin{equation} X_{t}-\tilde{\alpha}-\tilde{\beta}t=\phi X_{t-1}-\phi\tilde{\alpha}-\phi\tilde{\beta}\left(t-1\right)+W_{t}, \end{equation}\] that is \[\begin{equation} X_{t}=\tilde{\alpha}\left(1-\phi\right)+\tilde{\beta}\phi+\tilde{\beta}\left(1-\phi\right)t+\phi X_{t-1}+W_{t}. \end{equation}\] for every \(n\in\mathbb{N}\). In the end, \(\mathbf{X}\) satisfies Equation (11.2) for \[\begin{equation} \alpha=\tilde{\alpha}\left(1-\phi\right)+\tilde{\beta}\phi\qquad \text{and}\qquad \beta=\tilde{\beta}\left(1-\phi\right). \tag{12.8} \end{equation}\] In the end, note that Equations (12.6) and (12.8) are equivalent.

Let \(\left(X_{t}\right)_{t\in\mathbb{N}_{0}}\equiv\mathbf{X}\) be an \(AR(1)\) process satisfying Equation (11.2), for some \(\phi,\alpha,\beta\in\mathbb{R}\) and some state innovation \(\left(W_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{W}\).

Proposition 12.2 (AR(1) process representation) We have \[\begin{equation} X_{t}=\phi^{t}X_{0}+g\left(t;\alpha,\beta,\phi\right)+\sum_{s=1}^{t}\phi^{t-s}W_{s}, \tag{12.9} \end{equation}\] for every \(t\in\mathbb{N}\), where \[\begin{equation} g\left(t;\alpha,\beta,\phi\right)\equiv\alpha\frac{1-\phi^{t}}{1-\phi} +\beta\frac{t-\left(t+1\right)\phi+\phi^{t+1}}{\left(1-\phi\right)^{2}}. \tag{12.10} \end{equation}\] More generally, \[\begin{equation} X_{t}=\phi^{t-s}X_{s}+g\left(s,t;\alpha,\beta,\phi\right)+\sum_{r=s+1}^{t}\phi^{t-r}W_{r}, \tag{12.11} \end{equation}\] for all \(s,t\in\mathbb{N}\ \)such that \(0\leq s<t\), where \[\begin{equation} g\left(s,t;\alpha,\beta,\phi\right)\equiv\alpha\frac{1-\phi^{t-s}}{1-\phi} +\beta\frac{t-\left(t+1\right)\phi-s\phi^{t-s}+\left(s+1\right)\phi^{t-s+1}}{\left(1-\phi\right)^{2}}. \tag{12.12} \end{equation}\] In particular, if \(\beta=0\) we have \[\begin{equation} X_{t}=\phi^{t}X_{0}+\alpha\frac{1-\phi^{t}}{1-\phi}+\sum_{s=1}^{t}\phi^{t-s}W_{s}, \tag{12.11} \end{equation}\] for every \(t\in\mathbb{N}\), and \[\begin{equation} X_{t}=\phi^{t-s}X_{s}+\alpha\frac{1-\phi^{t-s}}{1-\phi}+\sum_{r=s+1}^{t}\phi^{t-s}W_{s}, \tag{12.11} \end{equation}\]

Proof. According to (12.2), Equation (12.9) is clearly true for \(t=1\). Assume inductively that (12.9) holds true for some \(t>1\) and consider the case \(t+1\). We can then write \[\begin{align} X_{t+1} & =\alpha+\beta\left(t+1\right)+\phi X_{t}+W_{t+1}\\ & =\alpha+\beta\left(t+1\right)+\phi\left(\phi^{t}X_{0}+\alpha \frac{1-\phi^{t}}{1-\phi}+\beta\frac{t-\left(t+1\right)\phi+\phi^{t+1}}{\left(1-\phi\right)^{2}} +\sum_{s=1}^{t}\phi^{t-s}W_{s}\right)+W_{t+1}\\ & =\phi^{t+1}X_{0}+\alpha\left(\frac{1-\phi^{t}}{1-\phi}\phi+1\right) +\beta\left(\frac{t-\left(t+1\right)\phi+\phi^{t+1}}{\left(1-\phi\right)^{2}}\phi+t+1\right) +\sum_{s=1}^{t}\phi^{t+1-s}W_{s}+W_{t+1}\\ & =\phi^{t+1}X_{0}+\alpha\frac{1-\phi^{t+1}}{1-\phi} +\beta\frac{t+1-\left(t+2\right)\phi+\phi^{t+2}}{\left(1-\phi\right)^{2}}+\sum_{s=1}^{t+1}\phi^{t+1-s}W_{s}, \end{align}\] which is the desired Equation (12.9) in the case \(t+1\). By virtue of the Induction Principle, we can then conclude that Equation (12.9) holds true for every \(t\geq1\). Now, considering any \(s\) such that \(0\leq s\leq t\), we can write \[\begin{align} X_{t} & =\phi^{t}X_{0}+\alpha\frac{1-\phi^{t}}{1-\phi} +\beta\frac{t-\left(t+1\right)\phi+\phi^{t+1}}{\left(1-\phi\right)^{2}}+\sum_{s=1}^{t}\phi^{t-s}W_{s}\\ & =\phi^{t-s}\phi^{s}X_{0}+\alpha\frac{1-\phi^{s}}{1-\phi}\phi^{t-s}+\frac{1-\phi^{t-s}}{1-\phi}\alpha\\ & +\beta\frac{s-\left(s+1\right)\phi+\phi^{s+1}}{\left(1-\phi\right)^{2}}\phi^{t-s} +\beta\frac{t-\left(t+1\right)\phi-s\phi^{t-s}+\left(s+1\right)\phi^{t-s+1}}{\left(1-\phi\right)^{2}}\\ & +\phi^{t-s}\sum_{r=1}^{s} \phi^{s-r}W_{r}+\sum_{r=s+1}^{t}\phi^{t-r}W_{r}\\ & =\phi^{t-s}\left(\phi^{s}X_{0}+\alpha\frac{1-\phi^{s}}{1-\phi} +\beta\frac{s-\left(s+1\right)\phi+\phi^{s+1}}{\left(1-\phi\right)^{2}}+ \sum_{r=1}^{s}\phi^{s-r}W_{r}\right) \\ & +\alpha\frac{1-\phi^{t-s}}{1-\phi} +\beta\frac{t-\left(t+1\right)\phi-s\phi^{t-s}+\left(s+1\right)\phi^{t-s+1}}{\left(1-\phi\right)^{2}}+ \sum_{r=s+1}^{t}\phi^{t-r}W_{r}, \end{align}\] which is the desired (12.11).

Let \(\left(\mathcal{F}_{t}^{X_{0},\mathbf{W}}\right)_{t\in\mathbb{N}_{0}}\equiv\mathfrak{F}^{X_{0},\mathbf{W}}\) the filtration generated by the initial state \(X_{0}\) of the \(AR\left(1\right)\) process \(\mathbf{X}\) and the innovation process \(\mathbf{W}\), that is \[\begin{equation} \mathcal{F}_{0}^{X_{0},\mathbf{W}}\overset{\text{def}}{=}\sigma\left(X_{0}\right) \quad\text{and}\quad \mathcal{F}_{t}^{X_{0},\mathbf{W}}\overset{\text{def}}{=} \sigma\left(X_{0},W_{1},\dots,W_{t}\right),\quad\forall t\in\mathbb{N}, \end{equation}\] where \(\sigma\left(X,Y,Z,\dots\right)\) denotes the \(\sigma\)-algebra generated by the random variables \(X,Y,Z,\dots\)

Proposition 12.3 (AR(1) process as adapted process) The \(AR\left(1\right)\) process \(X\) is adapted to \(\mathfrak{F}^{X_{0},W}\).

Proof. The claim is an immediate consequence of Equations (12.9) and (12.10).

Note that when \(X_{0}\equiv x_{0}\in\mathbb{R}^{N}\) or we consider an \(AR\left(1\right)\) process \(\mathbf{X}\) with time set \(\mathbb{T}=\mathbb{Z}\), the process \(\mathbf{X}\) turns out to adapted to the filtration \(\left(\mathcal{F}_{t}^{W}\right)_{t\in\mathbb{T}}\equiv\mathfrak{F}^{W}\) generated by the innovation process \(\mathbf{W}\).

Proposition 12.4 (Order of an AR(1) process) If the innovation \(\mathbf{W}\) is a process of order \(K\), then the \(AR(1)\) process \(\mathbf{X}\) is also a process of order \(K\), for every \(K\geq2\).

Proof. Recalling that the random variables with finite moment of order \(K\) constitute a Banach space, the claim is an immediate consequence of Equations (12.9) and (12.10).

Proposition 12.5 (Independence of an AR(1) processes from future state innovations) The random variables \(X_{1},\dots,X_{t}\) in the \(AR\left(1\right)\) process \(\mathbf{X}\) are independent of the random variables \(W_{t+1},W_{t+2},\dots\) of the innovation process \(\mathbf{W}\) for every \(t\in\mathbb{N}\).

Proposition 12.6 (Markov property of an AR(1) process) The \(AR\left(1\right)\) process \(\mathbf{X}\) is a Markov process, that is \[\begin{equation} \mathbf{P}\left(X_{t}\in B\mid\mathcal{F}_{s}^{X_{0},\mathbf{W}}\right) =\mathbf{P}\left(X_{t}\in B\mid\sigma\left(X_{s}\right)\right), \tag{11.7} \end{equation}\] for every \(B\in\mathcal{B}\left(\mathbb{R}^{N}\right)\) and all \(s,t\in\mathbb{N}_{0}\) such that \(0\leq s<t\).

Proof. Considering Equation (12.11), we can write \[\begin{equation} \mathbf{P}\left(X_{t}\in B\mid\mathcal{F}_{s}^{W}\right)=\mathbf{P}\left(\phi^{t-s}X_{s}+g\left(s,t;\alpha,\beta\right) +\sum_{r=s+1}^{t}\phi^{t-r}W_{r}\in B\mid\mathcal{F}_{s}^{W}\right), \tag{12.13} \end{equation}\] for every \(B\in\mathcal{B}\left(\mathbb{R}^{N}\right)\), for every \(t\geq s\). Now, under the assumption that \(W\) is a strong white noise, the random variable \(\sum_{r=s+1}^{t}\phi^{t-r}W_{r}\) is independent of \(\mathcal{F}_{s}^{X_{0},W}\) and \(X_{s}+g\left(t,s\right)\) is clearly \(\sigma\left(X_{s}\right)\)-measurable. Therefore \[\begin{align} & \mathbf{P}\left(\phi^{t-s}X_{s}+g\left(s,t;\alpha,\beta\right)+ \sum_{r=s+1}^{t}\phi^{t-r}W_{r}\in B\mid\mathcal{F}_{s}^{W}\right)\\ & =\mathbf{P}\left(\phi^{t-s}X_{s}+g\left(s,t;\alpha,\beta\right) +\sum_{r=s+1}^{t}\phi^{t-r}W_{r}\in B\mid\sigma\left(X_{s}\right)\right), \tag{12.14} \end{align}\] for every \(B\in\mathcal{B}\left(\mathbb{R}^{N}\right)\). Combining Equations (12.13) and (12.14), we show the Markov property.

Proposition 12.7 (Mean Function of AR(1) Processes) The mean function \(\mu_{\mathbf{X}}:\mathbb{N}_{0}\rightarrow\mathbb{R}\) of \(X\) is given by \[\begin{equation} \mu_{\mathbf{X}}\left(t\right)=\left\{ \begin{array} [c]{ll} \mu_{X_{0}}, & \text{if }t=0,\\ \mu_{X_{0}}\phi^{t}+\frac{\alpha\left(1-\phi\right)-\phi\beta}{\left(1-\phi\right)^{2}}\left(1-\phi^{t}\right) +\frac{\beta}{1-\phi}t, & \text{if }t>0, \end{array} \right. \tag{12.15} \end{equation}\] for every \(t\in\mathbb{N}_{0}\).

Proof. Equation (12.15) is evident if \(t=0\). Hence, consider the case \(t>0\). On account of Equation (12.9), since \(W\) is a white noise, we have \[\begin{align} \mu_{X}\left(t\right) & =\mathbf{E}\left[X_{t}\right] =\mathbf{E}\left[\phi^{t}X_{0}+g\left(t;\alpha,\beta\right)+\sum_{s=1}^{t}\phi^{t-s}W_{s}\right]\\ & =\phi^{t}\mathbf{E}\left[X_{0}\right]+g\left(t;\alpha,\beta\right) +\sum_{s=1}^{t}\phi^{t-s}\mathbf{E}\left[W_{s}\right]\\ & =\phi^{t}\mu_{X_{0}}+\alpha\frac{1-\phi^{t}}{1-\phi} +\beta\frac{t-\left(t+1\right)\phi+\phi^{t+1}}{\left(1-\phi\right)^{2}}\\ & =\frac{\alpha}{1-\phi}-\frac{\beta\phi}{\left(1-\phi\right)^{2}} +\frac{\beta}{1-\phi}t+\left(\mu_{X_{0}}-\frac{\alpha}{1-\phi}+\frac{\beta\phi}{\left(1-\phi\right)^{2}}\right)\phi^{t}\\ & =\frac{\alpha\left(1-\phi\right)-\beta\phi}{\left(1-\phi\right)^{2}} +\frac{\beta}{1-\phi}t+\left(\mu_{X_{0}}-\frac{\alpha\left(1-\phi\right)-\beta\phi}{\left(1-\phi\right)^{2}}\right)\phi^{t}, \end{align}\] as desired.

Proposition 12.8 (Variance Function of AR(1) Processes) The variance function \(\sigma_{\mathbf{X}}^{2}:\mathbb{N}_{0}\rightarrow\mathbb{R}_{+}\) of \(\mathbf{X}\) is given by \[\begin{equation} \sigma_{\mathbf{X}}^{2}\left(t\right)=\left\{ \begin{array} [c]{ll} \sigma_{X_{0}}^{2}+t\sigma_{\mathbf{W}}^{2}, & \text{if }\phi=-1,\\ \left(\sigma_{X_{0}}^{2}-\frac{\sigma_{\mathbf{W}}^{2}}{1-\phi^{2}}\right)\phi^{2t} +\frac{\sigma_{\mathbf{W}}^{2}}{1-\phi^{2}}, & \text{if }\phi\neq-1, \end{array} \right. \tag{12.16} \end{equation}\] for every \(t\in\mathbb{N}_{0}\).

Proof. Equation (12.16) is evident if \(t=0\). When \(t>0\), considering the properties of the variance operator Equation and that the random variables in \(W\) are uncorrelated, a straightforward computation from Equation (12.9) yields \[\begin{align} \mathbf{D}^{2}\left[X_{t}\right] & =\mathbf{D}^{2}\left[\phi^{t}X_{0}+g\left(t;\alpha,\beta\right)\sum_{s=1}^{t}\phi^{t-s}W_{s}\right]\\ & =\phi^{2t}\mathbf{D}^{2}\left[X_{0}\right]+\sum_{s=1}^{t}\phi^{2\left(t-s\right)}\mathbf{D}^{2}\left[W_{s}\right]\\ & =\sigma_{X_{0}}^{2}\phi^{2t}+\sigma_{\mathbf{W}}^{2}\sum_{s=1}^{t}\left(\phi^{2}\right)^{t-s}. \tag{12.17} \end{align}\] Now, we have \[\begin{equation} \sum_{s=1}^{t}\left(\phi^{2}\right)^{t-s}=\left\{ \begin{array} [c]{ll} \frac{1-\phi^{2t}}{1-\phi^{2}}, & \text{if }\phi\neq-1\\ t, & \text{if }\phi=-1 \end{array} \right. . \end{equation}\] Replacing the latter in (12.17), the desired (12.16) follows.

Proposition 12.9 (Autocovariance and Autocorrelation Function of AR(1) Processes) The autocovariance function \(\gamma_{\mathbf{X}}:\mathbb{N\times N}\rightarrow\mathbb{R}\) of \(\mathbf{X}\) is given by \[\begin{equation} \gamma_{\mathbf{X}}\left(s,t\right)=\left\{ \begin{array} [c]{ll} \left(-1\right)^{t-s}\left(\sigma_{X_{0}}^{2}+s\sigma_{\mathbf{W}}^{2}\right), & \text{if }\phi=-1,\\ \left(\left(\sigma_{X_{0}}^{2}-\frac{\sigma_{\mathbf{W}}^{2}}{1-\phi^{2}}\right)\phi^{2s} +\frac{\sigma_{\mathbf{W}}^{2}}{1-\phi^{2}}\right)\phi^{t-s}, & \text{if }\phi\neq-1, \end{array} \right. \tag{12.18} \end{equation}\] and the autocorrelation function \(\rho_{\mathbf{X}}:\mathbb{N\times N}\rightarrow\mathbb{R}\) of \(\mathbf{X}\) is given by \[\begin{equation} \rho_{\mathbf{X}}\left(s,t\right)=\left\{ \begin{array} [c]{ll} \left(-1\right)^{t-s}\sqrt{\frac{s}{t}\frac{\left(\frac{1}{s}\sigma_{X_{0}}^{2}+\sigma_{\mathbf{W}}^{2}\right)} {\left(\frac{1}{t}\sigma_{X_{0}}^{2}+\sigma_{\mathbf{W}}^{2}\right)}}, & \text{if }\phi=-1,\\ \frac{\left(\phi^{2s}\left(\sigma_{X_{0}}^{2}-\frac{\sigma_{\mathbf{W}}^{2}}{1-\phi^{2}}\right) +\frac{\sigma_{\mathbf{W}}^{2}}{1-\phi^{2}}\right)^{1/2}} {\left(\phi^{2t}\left(\sigma_{X_{0}}^{2}-\frac{\sigma_{\mathbf{W}}^{2}}{1-\phi^{2}}\right) +\frac{\sigma_{\mathbf{W}}^{2}}{1-\phi^{2}}\right)^{1/2}}\phi^{t-s}, & \text{if }\phi\neq-1, \end{array} \right. \tag{12.19} \end{equation}\] for all \(s,t\in\mathbb{N}\ \)such that \(s<t\).

Proof. Considering the properties of the covariance functional, Equation (12.11) implies \[\begin{align} \gamma_{X}\left(s,t\right) & =Cov\left(X_{s},X_{t}\right) =Cov\left(X_{s},\phi^{t-s}X_{s}+g\left(s,t;\alpha,\beta\right)+\sum_{r=s+1}^{t}\phi^{t-r}W_{r}\right)\\ & =\phi^{t-s}Cov\left(X_{s},X_{s}\right)+\sum_{r=s+1}^{t}\phi^{t-r}Cov\left(X_{s},W_{r}\right), \tag{12.20} \end{align}\] for all \(s,t\in\mathbb{N}\) such that \(s<t\). On the other hand, by Equation (12.9), \[\begin{align} Cov\left(X_{s},W_{r}\right) & = Cov\left(\phi^{s}X_{0}+g\left(s,t;\alpha,\beta\right)+\sum_{q=1}^{s}\phi^{t-q}W_{q},W_{r}\right)\\ & =\phi^{s}Cov\left(X_{0},W_{r}\right)+\sum_{q=1}^{s}\phi^{t-q}Cov\left(W_{q},W_{r}\right)\\ & =0, \tag{12.21} \end{align}\] because \(X_{0}\) is uncorrelated with the random variables in \(W\), we have \(q<r\), and \(W\) is a white noise. Combining (12.20) and (12.21), it follows \[\begin{equation} Cov\left(X_{s},X_{t}\right)=\phi^{t-s}\mathbf{D}^{2}\left[X_{s}\right]. \tag{12.22} \end{equation}\] The latter, on account of (12.16), yields the desired (12.18). As a consequence of (12.22), we have \[\begin{equation} \rho_{X}\left(s,t\right)=Corr\left(X_{s},X_{t}\right) =\frac{Cov\left(X_{t},X_{s}\right)}{\mathbf{D}\left[X_{t}\right]\mathbf{D}\left[X_{s}\right]} =\frac{\phi^{t-s}\mathbf{D}^{2}\left[X_{s}\right]}{\mathbf{D}\left[X_{t}\right]\mathbf{D}\left[X_{s}\right]} =\phi^{t-s}\frac{\mathbf{D}\left[X_{s}\right]}{\mathbf{D}\left[X_{t}\right]}. \tag{12.23} \end{equation}\] On the other hand, still on account of (12.16), we obtain \[\begin{equation} \phi^{t-s}\frac{\mathbf{D}\left[ X_{s}\right] }{\mathbf{D}\left[ X_{t}\right]}=\left\{ \begin{array} [c]{ll} \frac{\left(\phi^{2s}\left(\sigma_{X_{0}}^{2} -\frac{\sigma_{\mathbf{W}}^{2}}{1-\phi^{2}}\right) +\frac{\sigma_{\mathbf{W}}^{2}}{1-\phi^{2}}\right)^{1/2}} {\left(\phi^{2t}\left(\sigma_{X_{0}}^{2}-\frac{\sigma_{\mathbf{W}}^{2}}{1-\phi^{2}}\right) +\frac{\sigma_{\mathbf{W}}^{2}}{1-\phi^{2}}\right)^{1/2}}\phi^{t-s}, & \text{if }\phi\neq-1,\\ \left(-1\right)^{t-s}\frac{\left(\sigma_{X_{0}}^{2}+s\sigma_{\mathbf{W}}^{2}\right)^{1/2}} {\left(\sigma_{X_{0}}^{2}+t\sigma_{\mathbf{W}}^{2}\right)^{1/2}}, & \text{if }\phi=-1. \end{array} \right. \tag{12.23} \end{equation}\] Combining (12.22) and (12.23), it clearly follows Equation (12.19).

Proposition 12.10 (Yule Walker Equations for AR(1) Processes) We have \[\begin{equation} \gamma_{\mathbf{X}}\left(t,t\right)=\phi\gamma_{\mathbf{X}}\left(t,t-1\right)+\sigma_{\mathbf{W}}^{2}, \qquad\text{and}\qquad \gamma_{\mathbf{X}}\left(t,t-1\right)=\phi\gamma_{\mathbf{X}}\left(t-1,t-1\right), \tag{12.24} \end{equation}\] for every \(t\in\mathbb{N}\).

Proof. Considering Equation (12.1), thanks to the properties of the covariance functional, we can write \[\begin{align} \gamma_{X}\left(t,t\right)&=Cov\left(X_{t},X_{t}\right)=Cov\left(X_{t},\alpha+\beta t+\phi X_{t-1}+W_{t}\right)\\ & =\phi Cov\left(X_{t},X_{t-1}\right)+Cov\left(X_{t},W_{t}\right). \tag{12.25} \end{align}\] On the other hand, thanks to Equation (12.11), we have \[\begin{align} Cov\left(X_{t},W_{t}\right) & =Cov\left(\phi^{t}X_{0}+g\left(s,t;\alpha,\beta\right)+\sum_{s=1}^{t}\phi^{t-s}W_{s},W_{t}\right)\\ & =\phi^{t}Cov\left(X_{0},W_{t}\right)+\sum_{s=1}^{t-1}\phi^{t-s}Cov\left(W_{s},W_{t}\right)+Cov\left(W_{t},W_{t}\right), \tag{12.26} \end{align}\] for every \(t\in\mathbb{N}\). Now, \(X_{0}\) is independent of the random variables in \(W\) and \(W\) is a weak white noise, Therefore, \[\begin{equation} Cov\left(X_{0},W_{t}\right)=0\qquad\text{and}\qquad Cov\left(W_{s},W_{t}\right)=0, \tag{12.27} \end{equation}\] for all \(s,t\in\mathbb{N}\) such that \(s<t\). Combining Equations (12.25)-(12.27) we obtain the first equation in (12.24). Similarly, still considering Equation (12.1), we can write \[\begin{align} Cov\left(X_{t-1},X_{t}\right)& =Cov\left(X_{t-1},\alpha+\beta t+\phi X_{t-1}+W_{t}\right)\\ & =\phi Cov\left(X_{t-1},X_{t-1}\right)+Cov\left(X_{t-1},W_{t}\right) \tag{12.28} \end{align}\] and applying again Equation (12.11), a computation analogous to that yielding Equation (12.26) yields \[\begin{align} Cov\left(X_{t-1},W_{t}\right) & =Cov\left(\phi^{t-1}X_{0}+g\left(s,t;\alpha,\beta\right)+\sum_{s=1}^{t-1}\phi^{t-1-s}W_{s},W_{t}\right)\\ & =\phi^{t-1}Cov\left(X_{0},W_{t}\right)\sum_{s=1}^{t-1}\phi^{t-1-s}Cov\left(W_{s},W_{t}\right). \tag{12.29} \end{align}\] For the same reasons as in (12.27) we clearly have \[\begin{equation} Cov\left(X_{t-1},W_{t}\right)=0, \tag{12.30} \end{equation}\] for every \(t\in\mathbb{N}\). Hence, combining Equations (12.28) -(12.30) the second equation in (12.24) follows.

Proposition 12.11 (Gaussianity of an AR(1) processes) Assume that \(X_{0}\) is Gaussian, possibly degenerate. In addition, assume that the innovation process \(\mathbf{W}\) is Gaussian, in symbols \(\mathbf{W}\sim GWN(\sigma_{\mathbf{W}}^{2})\). Then we have \[\begin{equation} X_{t}\sim N\left(\mu_{\mathbf{X}}\left(t\right),\sigma_{\mathbf{X}}^{2}\left(t\right)\right), \tag{12.31} \end{equation}\] for every \(t\in\mathbb{N}\), where \(\mu_{\mathbf{X}}\left(t\right)\) and \(\sigma_{\mathbf{X}}^{2}\left(t\right)\) are given by (12.15) and (12.16), respectively. In addition, the process \(\mathbf{X}\) is Gaussian. As a consequence, \(\mathbf{X}\) has also Gaussian increments.

Proof. Thanks to the independence and Gaussianity of the random variables on the right hand side of Equation (@ref{AR(1)-charact.-Prop.-Equ.-01}) in Proposition \(\ref{AR(1)-charact.-Prop.}\), Equation (\(\ref{AR(1)-Gaussian-distrib.-Prop.-Equ.}\)) immediately follows. Moreover, in case \(X_{0}\) is degenerate, referring Equation (\(\ref{AR(1)-charact.-Prop.-Equ.-01}\)) to the standardized random variables in \(W\) and considering Proposition \(\ref{Gaussian-processes-equivalent-cond.-Prop.-01}\), the Gaussianity of the \(AR(1)\) process \(X\) immediately follows. In case \(X_{0}\) is not degenerate, but is independent of the random variables in \(W\), referring Equation (\(\ref{AR(1)-charact.-Prop.-Equ.-01}\)) to the standardization of \(X_{0}\) and the random variables in \(W\) and still considering Proposition \(\ref{Gaussian-processes-equivalent-cond.-Prop.-01}\), we obtain again the Gaussianity of the \(AR(1)\) process \(X\). In the end, the Gaussianity of the increments of \(X\) comes from Proposition \(\ref{Gaussian-process-increment-Prop.}\).%

Definition 12.3 (Gaussian AR(1) Processes) In light of Proposition 12.11 we call Gaussian an \(AR(1)\) process with Gaussian state innovation \(\mathbf{W}\).

So far we have obtained our results on \(AR\left(1\right)\) processes considering \(\phi\neq1\). However, by virtue of Equation (12.9), it is not difficult to recognize that when \(\left\vert\phi\right\vert >1\) the impact of past state innovations, represented by the random variables in \(\mathbf{W}\), on the current evolution of the random variables in the process \(\mathbf{X}\) increases over time. This circumstance is not of great interest for real world modeling. In fact, in most of the evolutions of real world variables the impact of past shocks diminishes over time and eventually becomes negligible. In addition, when \(\phi=-1\) we are not aware of real world models requiring the application of the corresponding \(AR\left(1\right)\) process. To this goal we consider the following

Assume that we have \[\begin{equation} \left\vert \phi\right\vert <1. \tag{12.32} \end{equation}\]

Definition 12.4 (Casuality assumption for AR(1) processes) The condition \(\left\vert \phi\right\vert <1\) is usually referred to as casuality assumption.

Assume also that we have \[\begin{equation} \beta=0 \tag{12.33} \end{equation}\]

Proposition 12.12 (Asymptotic weak stationarity for AR(1) processes) Under Assumptions (12.32) and (12.33), we have \[\begin{equation} \lim_{t\rightarrow+\infty}\mu_{\mathbf{X}}\left(t\right) =\frac{\alpha}{1-\phi}, \qquad\lim_{t\rightarrow+\infty}\sigma_{\mathbf{X}}^{2}\left(t\right) =\frac{\sigma_{\mathbf{W}}^{2}}{1-\phi^{2}}. \tag{12.34} \end{equation}\] In addition, \[\begin{equation} \gamma_{\mathbf{X}}\left(s,t\right)\simeq\frac{\phi^{t-s}}{1-\phi^{2}}\sigma_{\mathbf{W}}^{2} \quad\text{and}\quad \rho_{\mathbf{X}}\left(s,t\right)\simeq\phi^{t-s}. \tag{12.35} \end{equation}\] for all \(s,t\in\mathbb{N}\) such that \(1\ll s<t\).

The structure of Equations (12.15)-(12.19) suggests a characterization of the initial state \(X_{0}\) of the \(AR(1)\) process \(\mathbf{X}\) which implies important properties.

Definition 12.5 (Steady state of an AR(1) process) We say that the initial state \(X_{0}\) is a steady state of the \(AR(1)\) process \(\mathbf{X}\) if we have \[\begin{equation} \mu_{X_{0}}=\frac{\alpha}{1-\phi} \quad\text{and}\quad \sigma_{X_{0}}^{2}=\frac{\sigma_{\mathbf{W}}^{2}}{1-\phi^{2}}. \tag{12.36} \end{equation}\]

Proposition 12.13 (Steady State of an AR(1) Process) Under Assumptions (12.32) and (12.33), assume further that \(X_{0}\) is a steady state of the \(AR(1)\) process \(X\). Then we have \[\begin{equation} \mu_{\mathbf{X}}\left(t\right)=\frac{\alpha}{1-\phi} \quad\text{and}\quad \sigma_{\mathbf{X}}^{2}\left(t\right)=\frac{\sigma_{\mathbf{W}}^{2}}{1-\phi^{2}} \tag{12.37} \end{equation}\] for every \(t\in\mathbb{N}_{0}\). Moreover, \[\begin{equation} \gamma_{\mathbf{X}}\left(s,t\right)=\frac{\phi^{t-s}\sigma_{\mathbf{W}}^{2}}{1-\phi^{2}} \quad\text{and}\quad \rho_{\mathbf{X}}\left(s,t\right)=\phi^{t-s}, \tag{12.38} \end{equation}\] for all \(s,t\in\mathbb{N}_{0}\) such that \(s<t\).

Proposition 12.14 (Weak stationarity of an AR(1) process) Under Assumptions (12.32) and (12.33), assume further that \(X_{0}\) is a steady state of the \(AR(1)\) process \(\mathbf{X}\). Then \(\mathbf{X}\) is weak sense stationary. In particular, we can consider the reduced autocovariance and autocorrelation function of the process \(\mathbf{X}\) referred to \(0\), which are given by \[\begin{equation} \gamma_{\mathbf{X},0}\left(t\right)=\frac{\phi^{t}\sigma_{\mathbf{W}}^{2}}{1-\phi^{2}} \quad\text{and}\quad \rho_{\mathbf{X},0}\left(t\right)=\phi^{t}, \end{equation}\] for every \(t\in\mathbb{N}_{0}\).

Proof. Considering Definition \(\ref{WWS-process-Def.}\), the weak stationarity is immediate consequence of Equations (\(\ref{AR(1)-steady-state-Rem.-Equ.-02}\)) and (\(\ref{AR(1)-steady-state-Rem.-Equ.-02}\)).

Proposition 12.15 (Ergodicity of an AR(1) Processes) Under Assumptions (12.32) and (12.33), assume further that \(X_{0}\) is a steady state of the \(AR(1)\) process \(\mathbf{X}\). Then \(\mathbf{X}\) is mean square ergodic in the mean. Moreover, if \(X_{0}\) has finite moment of order \(4\) and the state innovation \(\mathbf{W}\) is also a process of order \(4\), then \(\mathbf{X}\) is mean-square ergodic in the wide sense.

Proof. Under Assumption \(\ref{AR(1)-weak-stationarity-Ass.}\), we have \[ \lim_{t\rightarrow\infty}\gamma_{X,0}\left(t\right)=\lim_{t\rightarrow \infty}\frac{\phi^{t}\sigma_{\mathbf{W}}^{2}}{1-\phi^{2}}=0. \] Hence, the mean-square ergodicity in the mean follows from Theorem \(\ref{mean-square-ergodicity-in-the mean-Theor.}\). Now, under the assumption that \(X_{0}\) has finite \(4\)th moment and the innovation \(W\) is a \(4\)th order strong white noise, with reference to the case \(k=0\), we have% \[\begin{align} & Cov\left(X_{0}X_{k},X_{t}X_{t+k}\right)\\ & =Cov\left(X_{0}^{2},X_{t}^{2}\right)=Cov\left(X_{0}^{2},\left( \alpha+\phi X_{t-1}+W_{t}\right)^{2}\right)\\ & =Cov\left(X_{0}^{2},\alpha^{2}+\phi^{2}X_{t-1}^{2}+W_{t}^{2}+2\alpha\phi X_{t-1}+2\alpha W_{t}+2\phi X_{t-1}W_{t}\right)\\ & =\phi^{2}Cov\left(X_{0}^{2},X_{t-1}^{2}\right)+Cov\left(X_{0}% ^{2},W_{t}^{2}\right)+2\alpha\phi Cov\left(X_{0}^{2},X_{t-1}\right) +2\alpha Cov\left(X_{0}^{2},W_{t}\right)+2\phi Cov\left(X_{0}% ^{2},X_{t-1}W_{t}\right), \label{AR(1)-wide-sense-ergodicity-Prop.-Proof-01}% \end{align}\] where% \[\begin{equation} Cov\left(X_{0}^{2},W_{t}^{2}\right)=\mathbf{E}\left[ X_{0}^{2}W_{t}% ^{2}\right] -\mathbf{E}\left[ X_{0}^{2}\right] \mathbf{E}\left[ W_{t}% ^{2}\right] =\mathbf{E}\left[ W_{t}^{2}\right] \mathbf{E}\left[ W_{t}% ^{2}\right] -\mathbf{E}\left[ X_{0}^{2}\right] \mathbf{E}\left[ W_{t}% ^{2}\right] =0, \label{AR(1)-wide-sense-ergodicity-Prop.-Proof-02}% \end{equation}\]% \[\begin{equation} Cov\left(X_{0}^{2},W_{t}\right)=\mathbf{E}\left[ X_{0}^{2}W_{t}\right] -\mathbf{E}\left[ X_{0}^{2}\right] \mathbf{E}\left[ W_{t}\right] =\mathbf{E}\left[ X_{0}^{2}\right] \mathbf{E}\left[ W_{t}\right] =0, \label{AR(1)-wide-sense-ergodicity-Prop.-Proof-03}% \end{equation}\] and, by virtue of Corollary \(\ref{AR(1)-independence-Cor.}\), \[\begin{align} Cov\left(X_{0}^{2},X_{t-1}W_{t}\right) & =\mathbf{E}\left[ X_{0}% ^{2}X_{t-1}W_{t}\right] -\mathbf{E}\left[ X_{0}^{2}\right] \mathbf{E}% \left[ X_{t-1}W_{t}\right]\\ & =\mathbf{E}\left[ X_{0}^{2}X_{t-1}\right] \mathbf{E}\left[ W_{t}\right] -\mathbf{E}\left[ X_{0}^{2}\right] \mathbf{E}\left[ X_{t-1}\right] \mathbf{E}\left[ W_{t}\right] =0. \label{AR(1)-wide-sense-ergodicity-Prop.-Proof-04}% \end{align}\] In addition, for \(t>1\), \[\begin{align} Cov\left(X_{0}^{2},X_{t-1}\right) & =Cov\left(X_{0}^{2},\phi^{t-1}% X_{0}+\alpha\frac{1-\phi^{t-1}}{1-\phi}+% %TCIMACRO{\tsum _{s=1}^{t-1}}% %BeginExpansion {\sum_{s=1}^{t-1}} %EndExpansion \phi^{t-1-s}W_{s}\right)\\ & =Cov\left(X_{0}^{2},\phi^{t-1}X_{0}\right)+% %TCIMACRO{\tsum _{s=1}^{t-1}}% %BeginExpansion {\sum_{s=1}^{t-1}} %EndExpansion \phi^{t-1-s}Cov\left(X_{0}^{2},W_{s}\right)\\ & =\phi^{t-1}\mathbf{E}\left[ X_{0}^{3}\right] -\phi^{t-1}\mathbf{E}\left[ X_{0}^{2}\right] \mathbf{E}\left[ X_{0}\right]\\ & =\phi^{t-1}\left(\mathbf{E}\left[ X_{0}^{3}\right] -\mathbf{E}\left[ X_{0}^{2}\right] \mathbf{E}\left[ X_{0}\right] \right) \label{AR(1)-wide-sense-ergodicity-Prop.-Proof-05}% \end{align}\] and, since% \[\begin{align} X_{t-1}^{2} & =\left(\phi^{t-1}X_{0}+\alpha\frac{1-\phi^{t-1}}{1-\phi}+% %TCIMACRO{\tsum _{s=1}^{t-1}}% %BeginExpansion {\sum_{s=1}^{t-1}} %EndExpansion \phi^{t-1-s}W_{s}\right)^{2}\\ & =\phi^{2\left(t-1\right)}X_{0}^{2}+\alpha^{2}\left(\frac{1-\phi ^{t-1}}{1-\phi}\right)^{2}+% %TCIMACRO{\tsum _{s=1}^{t-1}}% %BeginExpansion {\sum_{s=1}^{t-1}} %EndExpansion \phi^{2\left(t-1-s\right)}W_{s}^{2}+2\alpha\frac{1-\phi^{t-1}}{1-\phi}% \phi^{t-1}X_{0}\\ & +2\phi^{t-1}% %TCIMACRO{\tsum _{s=1}^{t-1}}% %BeginExpansion {\sum_{s=1}^{t-1}} %EndExpansion \phi^{t-1-s}X_{0}W_{s}+2\alpha\frac{1-\phi^{t-1}}{1-\phi}% %TCIMACRO{\tsum _{s=1}^{t-1}}% %BeginExpansion {\sum_{s=1}^{t-1}} %EndExpansion \phi^{t-1-s}W_{s}+2% %TCIMACRO{\tsum _{\substack{s,r=1\\r<s}}^{t-1}}% %BeginExpansion {\sum_{\substack{s,r=1\\r<s}}^{t-1}} %EndExpansion \phi^{\left(t-1-r\right)}\phi^{\left(t-1-s\right)}W_{r}W_{s}, \end{align}\] we have% \[\begin{align} Cov\left(X_{0}^{2},X_{t-1}^{2}\right) & =\phi^{2\left(t-1\right) }Cov\left(X_{0}^{2},X_{0}^{2}\right)+% %TCIMACRO{\tsum _{s=1}^{t-1}}% %BeginExpansion {\sum_{s=1}^{t-1}} %EndExpansion \phi^{2\left(t-1-s\right)}Cov\left(X_{0}^{2},W_{s}^{2}\right) \nonumber\\ & +2\alpha\frac{1-\phi^{t-1}}{1-\phi}\phi^{t-1}Cov\left(X_{0}^{2}% ,X_{0}\right)+2\phi^{t-1}% %TCIMACRO{\tsum _{s=1}^{t-1}}% %BeginExpansion {\sum_{s=1}^{t-1}} %EndExpansion \phi^{t-1-s}Cov\left(X_{0}^{2},X_{0}W_{s}\right)\\ & +2\alpha\frac{1-\phi^{t-1}}{1-\phi}% %TCIMACRO{\tsum _{s=1}^{t-1}}% %BeginExpansion {\sum_{s=1}^{t-1}} %EndExpansion \phi^{t-1-s}Cov\left(X_{0}^{2},W_{s}\right)+2% %TCIMACRO{\tsum _{\substack{s,r=1\\r<s}}^{t-1}}% %BeginExpansion {\sum_{\substack{s,r=1\\r<s}}^{t-1}} %EndExpansion \phi^{\left(t-1-r\right)}\phi^{\left(t-1-s\right)}Cov\left(X_{0}% ^{2},W_{r}W_{s}\right). \label{AR(1)-wide-sense-ergodicity-Prop.-Proof-06}% \end{align}\] Considering that for \(r,s=1,\dots,t-1\), \(r<s\), the random variable \(X_{0}\) is independent of \(W_{r}\) and \(W_{s}\), which are independent of each other, we have \[\begin{equation} Cov\left(X_{0}^{2},W_{s}^{2}\right)=\mathbf{E}\left[ X_{0}^{2}W_{s}% ^{2}\right] -\mathbf{E}\left[ X_{0}^{2}\right] \mathbf{E}\left[ W_{s}% ^{2}\right] =\mathbf{E}\left[ X_{0}^{2}\right] \mathbf{E}\left[ W_{s}% ^{2}\right] -\mathbf{E}\left[ X_{0}^{2}\right] \mathbf{E}\left[ W_{s}% ^{2}\right] =0, \label{AR(1)-wide-sense-ergodicity-Prop.-Proof-07}% \end{equation}\]% \[\begin{equation} Cov\left(X_{0}^{2},X_{0}W_{s}\right)=\mathbf{E}\left[ X_{0}^{3}% W_{s}\right] -\mathbf{E}\left[ X_{0}^{2}\right] \mathbf{E}\left[ X_{0}W_{s}\right] =\mathbf{E}\left[ X_{0}^{3}\right] \mathbf{E}\left[ W_{s}\right] -\mathbf{E}\left[ X_{0}^{2}\right] \mathbf{E}\left[ X_{0}\right] \mathbf{E}\left[ W_{s}\right] =0, \label{AR(1)-wide-sense-ergodicity-Prop.-Proof-08}% \end{equation}\]% \[\begin{equation} Cov\left(X_{0}^{2},W_{s}\right)=\mathbf{E}\left[ X_{0}^{2}W_{s}\right] -\mathbf{E}\left[ X_{0}^{2}\right] \mathbf{E}\left[ W_{s}\right] =\mathbf{E}\left[ X_{0}^{2}\right] \mathbf{E}\left[ W_{s}\right] =0, \label{AR(1)-wide-sense-ergodicity-Prop.-Proof-09}% \end{equation}\] and% \[\begin{align} Cov\left(X_{0}^{2},W_{r}W_{s}\right) & =\mathbf{E}\left[ X_{0}^{2}% W_{r}W_{s}\right] -\mathbf{E}\left[ X_{0}^{2}\right] \mathbf{E}\left[ W_{r}W_{s}\right]\\ & =\mathbf{E}\left[ X_{0}^{2}\right] \mathbf{E}\left[ W_{r}W_{s}\right] -\mathbf{E}\left[ X_{0}^{2}\right] \mathbf{E}\left[ W_{r}\right] \mathbf{E}\left[ W_{s}\right] =\mathbf{E}\left[ X_{0}^{2}\right] \mathbf{E}\left[ W_{r}\right] \mathbf{E}\left[ W_{s}\right] =0. \label{AR(1)-wide-sense-ergodicity-Prop.-Proof-10}% \end{align}\] Combining (\(\ref{AR(1)-wide-sense-ergodicity-Prop.-Proof-06}\)% )-(\(\ref{AR(1)-wide-sense-ergodicity-Prop.-Proof-10}\)), we obtain% \[\begin{equation} Cov\left(X_{0}^{2},X_{t-1}^{2}\right)=\phi^{2\left(t-1\right) }\mathbf{D}^{2}\left[ X_{0}^{2}\right] +2\alpha\frac{1-\phi^{t-1}}{1-\phi }\phi^{t-1}\left(\mathbf{E}\left[ X_{0}^{3}\right] -\mathbf{E}\left[ X_{0}^{2}\right] \mathbf{E}\left[ X_{0}\right] \right). \label{AR(1)-wide-sense-ergodicity-Prop.-Proof-11}% \end{equation}\] Combining (\(\ref{AR(1)-wide-sense-ergodicity-Prop.-Proof-01}\)% )-(\(\ref{AR(1)-wide-sense-ergodicity-Prop.-Proof-05}\)) and (\(\ref{AR(1)-wide-sense-ergodicity-Prop.-Proof-11}\)), it then follows,% \[ Cov\left(X_{0}X_{k},X_{t}X_{t+k}\right)=\phi^{t-1}\left(\mathbf{E}% \left[ X_{0}^{3}\right] -\mathbf{E}\left[ X_{0}^{2}\right] \mathbf{E}% \left[ X_{0}\right] \right)+\phi^{2\left(t-1\right)}\mathbf{D}% ^{2}\left[ X_{0}^{2}\right] +2\alpha\frac{1-\phi^{t-1}}{1-\phi}\phi ^{t-1}\left(\mathbf{E}\left[ X_{0}^{3}\right] -\mathbf{E}\left[ X_{0}% ^{2}\right] \mathbf{E}\left[ X_{0}\right] \right). \] Therefore,% \[ \lim_{t\rightarrow+\infty}Cov\left(X_{0}X_{k},X_{t}X_{t+k}\right)=0. \] Now, for \(k>0\), we have% \[ X_{0}X_{k}=X_{0}\left(\alpha+\phi X_{k-1}+W_{k}\right)=\alpha X_{0}+\phi X_{t+k-1}+X_{0}W_{k}% \] and% \[\begin{align} X_{t}X_{t+k} & =\left(\alpha+\phi X_{t-1}+W_{t}\right)\left( \alpha+\phi X_{t+k-1}+W_{t+k}\right) \\ & =\alpha^{2}+\alpha\left(W_{t}+W_{t+k}\right)+\alpha\phi\left( X_{t-1}+W_{t+k-1}\right)+\phi\left(X_{t-1}W_{t+k}+X_{t+k-1}W_{t}\right) \\ & +\phi^{2}X_{t-1}X_{t+k-1}+W_{t}W_{t+k}. \end{align}\] As a consequence,% \[\begin{align} & Cov\left(X_{0}X_{k},X_{t}X_{k+t}\right) \\ & =\alpha^{2}\left(Cov\left(X_{0},W_{t}\right)+Cov\left(X_{0}% ,W_{t+k}\right)\right)+\alpha^{2}\phi\left(Cov\left(X_{0}% ,X_{t-1}\right)+Cov\left(X_{0},W_{t+k-1}\right)\right) \\ & +\alpha\phi\left(Cov\left(X_{0},X_{t-1}W_{t+k}\right)+Cov\left( X_{0},X_{t+k-1}W_{t}\right)\right)+\alpha\phi^{2}Cov\left(X_{0}% ,X_{t-1}X_{t+k-1}\right)+\alpha Cov\left(X_{0},W_{t}W_{t+k}\right) \\ & +\alpha\phi\left(Cov\left(X_{0}X_{k-1},W_{t}\right)+Cov\left( X_{0}X_{k-1},W_{t+k}\right)\right)+\alpha\phi^{2}\left(Cov\left( X_{0}X_{k-1},X_{t-1}\right)+Cov\left(X_{0}X_{k-1},W_{t+k-1}\right) \right) \\ & +\phi^{2}\left(Cov\left(X_{0}X_{k-1},X_{t-1}W_{t+k}\right)+Cov\left( X_{0}X_{k-1},X_{t+k-1}W_{t}\right)\right)+\phi^{3}Cov\left(X_{0}% X_{k-1},X_{t-1}X_{t+k-1}\right) \\ & +\phi Cov\left(X_{0}X_{k-1},W_{t}W_{t+k}\right) \\ & +\alpha\left(Cov\left(X_{0}W_{k},W_{t}\right)+Cov\left(X_{0}% W_{k},W_{t+k}\right)\right)+\alpha\phi\left(Cov\left(X_{0}% W_{k},X_{t-1}\right)+Cov\left(X_{0}W_{k},W_{t+k-1}\right)\right) \\ & +\phi\left(Cov\left(X_{0}W_{k},X_{t-1}W_{t+k}\right)+Cov\left( X_{0}W_{k},X_{t+k-1}W_{t}\right)\right)+\phi^{2}Cov\left(X_{0}% W_{k},X_{t-1}X_{t+k-1}\right) \\ & +Cov\left(X_{0}W_{k},W_{t}W_{t+k}\right) \end{align}\] for every \(t\in\mathbb{N}\) and \(k\in\mathbb{N}\). Now, we clearly have% \[ Cov\left(X_{0},W_{t}\right)=Cov\left(X_{0},W_{t+k}\right)=Cov\left( X_{0},W_{t+k-1}\right)=0 \] and% \[ Cov\left(X_{0},W_{t}W_{t+k}\right)=\mathbf{E}\left[ X_{0}W_{t}% W_{t+k}\right] -\mathbf{E}\left[ X_{0}\right] \mathbf{E}\left[ W_{t}W_{t+k}\right] =\mathbf{E}\left[ X_{0}\right] \mathbf{E}\left[ W_{t}\right] \mathbf{E}\left[ W_{t+k}\right] -\mathbf{E}\left[ X_{0}\right] \mathbf{E}\left[ W_{t}\right] \mathbf{E}\left[ W_{t+k}% \right] =0 \] Furthermore, by virtue of Corollary \(\ref{AR(1)-independence-Cor.}\), since \(\max\left\{ k-1,t-1\right\} <t+k\),% \[\begin{align} Cov\left(X_{0},X_{t-1}W_{t+k}\right) & =\mathbf{E}\left[ X_{0}% X_{t-1}W_{t+k}\right] -\mathbf{E}\left[ X_{0}\right] \mathbf{E}\left[ X_{t-1}W_{t+k}\right] \\ & =\mathbf{E}\left[ X_{0}X_{t-1}\right] \mathbf{E}\left[ W_{t+k}\right] -\mathbf{E}\left[ X_{0}\right] \mathbf{E}\left[ X_{t-1}\right] \mathbf{E}\left[ W_{t+k}\right] =0, \end{align}\]% \[ Cov\left(X_{0}X_{k-1},W_{t}\right)=\mathbf{E}\left[ X_{0}X_{k-1}% W_{t}\right] -\mathbf{E}\left[ X_{0}X_{t-1}\right] \mathbf{E}\left[ W_{t}\right] =\mathbf{E}\left[ X_{0}X_{k-1}\right] \mathbf{E}\left[ W_{t}\right] =0, \]% \[ Cov\left(X_{0}X_{k-1},W_{t+k}\right)=\mathbf{E}\left[ X_{0}X_{k-1}% W_{t+k}\right] -\mathbf{E}\left[ X_{0}X_{t-1}\right] \mathbf{E}\left[ W_{t+k}\right] =\mathbf{E}\left[ X_{0}X_{k-1}\right] \mathbf{E}\left[ W_{t+k}\right] =0, \] and% \[\begin{align} Cov\left(X_{0}X_{k-1},X_{t-1}W_{t+k}\right) & =\mathbf{E}\left[ X_{0}X_{k-1}X_{t-1}W_{t+k}\right] -\mathbf{E}\left[ X_{0}X_{k-1}\right] \mathbf{E}\left[ X_{t-1}W_{t+k}\right] \\ & =\mathbf{E}\left[ X_{0}X_{k-1}X_{t-1}\right] \mathbf{E}\left[ W_{t+k}\right] -\mathbf{E}\left[ X_{0}X_{k-1}\right] \mathbf{E}\left[ X_{t-1}\right] \mathbf{E}\left[ W_{t+k}\right] =0. \end{align}\] We have also% \[ Cov\left(X_{0}X_{k-1},W_{t+k-1}\right)=\mathbf{E}\left[ X_{0}% X_{k-1}W_{t+k-1}\right] -\mathbf{E}\left[ X_{0}X_{k-1}\right] \mathbf{E}\left[ W_{t+k-1}\right] =\mathbf{E}\left[ X_{0}X_{k-1}\right] \mathbf{E}\left[ W_{t+k-1}\right] =0 \] Similarly, for each \(k\in\mathbb{N}\), and for \(t\) large enough% \[ Cov\left(X_{0}W_{k},W_{t}\right)=\mathbf{E}\left[ X_{0}W_{k}W_{t}\right] -\mathbf{E}\left[ X_{0}W_{k}\right] \mathbf{E}\left[ W_{t}\right] =\mathbf{E}\left[ X_{0}W_{k}\right] \mathbf{E}\left[ W_{t}\right] =0, \]% \[ Cov\left(X_{0}W_{k},W_{t+k-1}\right)=\mathbf{E}\left[ X_{0}W_{k}% W_{t+k-1}\right] -\mathbf{E}\left[ X_{0}W_{k}\right] \mathbf{E}\left[ W_{t+k-1}\right] =\mathbf{E}\left[ X_{0}W_{k}\right] \mathbf{E}\left[ W_{t+k-1}\right] =0, \]% \[ Cov\left(X_{0}W_{k},W_{t+k}\right)=\mathbf{E}\left[ X_{0}W_{k}% W_{t+k}\right] -\mathbf{E}\left[ X_{0}W_{k}\right] \mathbf{E}\left[ W_{t+k}\right] =\mathbf{E}\left[ X_{0}W_{k}\right] \mathbf{E}\left[ W_{t+k}\right] =0, \]% \[ Cov\left(X_{0}X_{k-1},W_{t}\right)=\mathbf{E}\left[ X_{0}X_{k-1}% W_{t}\right] -\mathbf{E}\left[ X_{0}X_{k-1}\right] \mathbf{E}\left[ W_{t}\right] =\mathbf{E}\left[ X_{0}X_{k-1}\right] \mathbf{E}\left[ W_{t}\right] =0, \]% \[\begin{align} Cov\left(X_{0}X_{k-1},X_{t-1}W_{t+k}\right) & =\mathbf{E}\left[ X_{0}X_{k-1}X_{t-1}W_{t+k}\right] -\mathbf{E}\left[ X_{0}X_{k-1}\right] \mathbf{E}\left[ X_{t-1}W_{t+k}\right] \\ & =\mathbf{E}\left[ X_{0}X_{k-1}X_{t-1}\right] \mathbf{E}\left[ W_{t+k}\right] -\mathbf{E}\left[ X_{0}X_{k-1}\right] \mathbf{E}\left[ X_{t-1}\right] \mathbf{E}\left[ W_{t+k}\right] =0, \end{align}\]% \[\begin{align} Cov\left(X_{0}X_{k-1},W_{t}W_{t+k}\right) & =\mathbf{E}\left[ X_{0}X_{k-1}W_{t}W_{t+k}\right] -\mathbf{E}\left[ X_{0}X_{k-1}\right] \mathbf{E}\left[ W_{t}W_{t+k}\right] \\ & =\mathbf{E}\left[ X_{0}X_{k-1}W_{t}\right] \mathbf{E}\left[ W_{t+k}\right] -\mathbf{E}\left[ X_{0}X_{k-1}\right] \mathbf{E}\left[ W_{t}\right] \mathbf{E}\left[ W_{t+k}\right] =0, \end{align}\] \[\begin{align} Cov\left(X_{0}W_{k},X_{t-1}W_{t+k}\right) & =\mathbf{E}\left[ X_{0}% W_{k}X_{t-1}W_{t+k}\right] -\mathbf{E}\left[ X_{0}W_{k}\right] \mathbf{E}\left[ X_{t-1}W_{t+k}\right] \\ & =\mathbf{E}\left[ X_{0}W_{k}X_{t-1}\right] \mathbf{E}\left[ W_{t+k}\right] -\mathbf{E}\left[ X_{0}W_{k}\right] \mathbf{E}\left[ X_{t-1}\right] \mathbf{E}\left[ W_{t+k}\right] =0, \end{align}\] \[\begin{align} Cov\left(X_{0}W_{k},W_{t}W_{t+k}\right) & =\mathbf{E}\left[ X_{0}% W_{k}W_{t}W_{t+k}\right] -\mathbf{E}\left[ X_{0}W_{k}\right] \mathbf{E}% \left[ W_{t}W_{t+k}\right] \\ & = \end{align}\] and \[\begin{align} & Cov\left(X_{0}X_{k},X_{t}X_{k+t}\right) \\ & =\alpha^{2}\phi Cov\left(X_{0},X_{t-1}\right) \\ & +\alpha\phi^{2}Cov\left(X_{0},X_{t-1}X_{t+k-1}\right) \\ & +\alpha\phi+\alpha\phi^{2}Cov\left(X_{0}X_{k-1},X_{t-1}\right) \\ & +\phi^{2}\left(+Cov\left(X_{0}X_{k-1},X_{t+k-1}W_{t}\right)\right) +\phi^{3}Cov\left(X_{0}X_{k-1},X_{t-1}X_{t+k-1}\right) \\ & +\alpha\phi\left(Cov\left(X_{0}W_{k},X_{t-1}\right)+\right) \\ & +\phi\left(+Cov\left(X_{0}W_{k},X_{t+k-1}W_{t}\right)\right) +\phi^{2}Cov\left(X_{0}W_{k},X_{t-1}X_{t+k-1}\right)+ \end{align}\] %

\[ Cov\left(X_{0},X_{t-1}\right)=\gamma_{X,0}\left(t-1\right)=\frac {\phi^{t-1}\sigma_{\mathbf{W}}^{2}}{1-\phi^{2}}. \] By virtue of Theorem \(\ref{mean-square-ergodicity-in-the-wide-sense-Theor.}\) the desired result follows.

Corollary 12.1 (Gaussian AR(1) process) Under Assumptions (12.32) and (12.33), assume further that \(X_{0}\) is a Gaussian steady state of the \(AR(1)\) process \(\mathbf{X}\), and that the state innovation \(\mathbf{W}\) is Gaussian, in symbols \(\mathbf{W}\sim GWN(\sigma_{\mathbf{W}}^{2})\). Then we have \[ X_{t}\sim N\left(\frac{\alpha}{1-\phi},\frac{\sigma_{\mathbf{W}}^{2}}{1-\phi^{2}}\right) \] for every \(t\in\mathbb{N}_{0}\). In addition, the process \(\mathbf{X}\) is Gaussian and mean square wide sense ergodic.

12.1.1 Parameter Estimation

Let \(\left(x_{t}\right)_{t=1}^{T}\equiv\mathbf{x}\) be a univariate real time series, for some \(T\geq2\), and let \(\left(X_{t}\right)_{t\in\mathbb{N}_{0}}\equiv\mathbf{X}\) be an \(AR\left(1\right)\) process, which satisfies Equation (12.2), for some drift \(\alpha\in\mathbb{R}\), some linear trend coefficent \(\beta\in\mathbb{R}\), some regression coefficient \(\phi\in\mathbb{R}\), and some state innovation process \(\left(W_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{W}\sim SWN\left(\sigma_{\mathbf{W}}^{2}\right)\), for some \(\sigma_{\mathbf{W}}>0\).

We assume that the \(AR\left(1\right)\) process \(\mathbf{X}\) is a model of \(\mathbf{x}\) for suitable values of the parameters \(\alpha\), \(\beta\), \(\phi\), and \(\sigma_{\mathbf{W}}\). The goal is to determine estimates \(\hat{\alpha}_{T}\left(\omega\right)\), \(\hat{\beta}_{T}\left(\omega\right)\), \(\hat{\phi}_{T}\), and \(\hat{\sigma}_{\mathbf{W},{T}}\left(\omega\right)\) of the parameters \(\alpha\), \(\beta\), \(\phi\), and \(\sigma_{\mathbf{W}}\), respectively, which allow the best fit of \(\mathbf{X}\) to the time series \(\mathbf{x}\).

12.1.1.1 Method of Moments

The easiest way to estimate the parameters of the \(AR\left(1\right)\) processes \(\mathbf{X}\) is undoubtedly the Method of Moments (\(MM\)). To apply this method is necessary to assume that \(\mathbf{X}\) is ergodic. This, in turns, lead us to assume that \[\begin{equation} \beta=0, \qquad \left\vert\phi\right\vert<1, \tag{12.39} \end{equation}\] and \(X_{0}\) is a steady state of the process. Nevertheless, note that, by virtue of Proposition 12.12, in the long run, Equation (12.39) alone let us to bypass the need to deal with an initial steady state of the process. Therefore, for statistical purposes, it is customary to think of an \(AR\left(1\right)\) processes \(\mathbf{X}\) satisfying Equation (12.39) as an asymptotically stationary and ergodic processes. Eventually, we can think of the process \(\mathbf{X}\) as it has already been running since enough time in the past to approach, at the present, the steady state \(X_{0}\) in which Equations (12.34) and (12.34) are satisfied. Furthermore. we have to introduce the statistics time average estimator, time variance estimator, and time autocorrelation estimator given by \[\begin{equation} \bar{X}_{T}\overset{\text{def}}{=}\frac{1}{T}\sum\limits_{t=1}^{T}X_{t}, \tag{12.40} \end{equation}\] \[\begin{equation} S_{\mathbf{X},T}^{2}\overset{\text{def}}{=}\frac{1}{T}\sum\limits_{t=1}^{T}\left(X_{t}-\bar{X}_{T}\right)^{2} \tag{12.41} \end{equation}\] and \[\begin{equation} R_{\mathbf{X},T}\left(1\right) \overset{\text{def}}{=}\frac{C_{\mathbf{X},T}\left(1\right)}{C_{\mathbf{X},T}\left(0\right)} =\frac{C_{\mathbf{X},T}\left(1\right)}{S_{\mathbf{X},T}^{2}} =\frac{\sum\limits_{t=1}^{T-1}\left(X_{t}-\bar{X}_{T}\right)\left(X_{t+1}-\bar{X}_{T}\right)^{\intercal}} {\sum\limits_{t=1}^{T}\left(X_{t}-\bar{X}_{T}\right)^{2}}, \tag{12.42} \end{equation}\] respectively. We then have

Proposition 12.16 (MM parameter estimation for AR(1) Processes) Assume that the \(AR\left(1\right)\) processes \(\mathbf{X}\) is ergodic. Then the \(MM\) estimates of the parameters \(\sigma_{\mathbf{W}}\), \(\alpha\), \(\phi\) are given by the solution of the equations \[\begin{equation} \frac{\hat{\alpha}_{T}\left(\omega\right)}{1-\hat{\phi}_{T}\left(\omega\right)} =\bar{x}_{T}, \quad \frac{\hat{\sigma}_{\mathbf{W},T}^{2}\left(\omega\right)}{1-\hat{\phi}_{T}^{2}\left(\omega\right)} =s_{\mathbf{X},T}^{2}, \quad\text{and}\quad \hat{\phi}_{T}\left(\omega\right)=\mathrm{r}_{\mathbf{X},T}\left(1\right), \tag{12.43} \end{equation}\] where \[\begin{equation} \bar{x}_{T}\equiv\frac{1}{T}\sum\limits_{t=1}^{T}x_{t}, \qquad s^{2}_{\mathbf{X},T}\equiv\frac{1}{T}\sum\limits_{t=1}^{T}\left(x_{t}-\bar{x}_{T}\right)^{2}, \quad\text{and}\quad r_{\mathbf{X},T}\left(1\right) =\frac{\sum\limits_{t=1}^{T-1}\left(x_{t}-\bar{x}_{T}\right)\left(x_{t+1}-\bar{x}_{T}\right)^{\intercal}} {\sum\limits_{t=1}^{T}\left(x_{t}-\bar{X}_{T}\right)^{2}} \end{equation}\] are the realizations of the time average estimator \(\bar{X}_{T}\), the time variance estimator \(\mathbf{S}^{2}_{\mathbf{X},T}\), and the time autocorrelation estimator \(R_{\mathbf{X},T}\left(1\right)\), respectively, on the observed path \(\left(x_{t}\right)_{t=1}^{T}\) of \(\mathbf{X}\).

Proof. When the \(AR\left(1\right)\) process \(\mathbf{X}\) is ergodic, considering that \(X_{0}\) is a steady state, beside Equation (12.39), we have \[\begin{equation} \mu_{\mathbf{X}}\left(t\right)=\frac{\alpha}{1-\phi}, \qquad \sigma_{\mathbf{X}}^{2}\left(t\right)=\frac{\sigma_{\mathbf{W}}^{2}}{1-\phi^{2}}, \quad\text{and}\quad \rho_{\mathbf{X},0}\left(t\right)=\phi^{t}, \tag{12.44} \end{equation}\] Hence, applying the method of moments, we can write the equations \[\begin{equation} \frac{\hat{\alpha}_{T}}{1-\hat{\phi}_{T}}=\bar{X}_{T} \qquad \frac{\hat{\sigma}_{\mathbf{W},T}^{2}}{1-\hat{\phi}_{T}^{2}}=S_{\mathbf{X},T}^{2}, \quad\text{and}\quad \hat{\phi}_{T}=\mathrm{P}_{\mathbf{X},T}\left(1\right). \tag{12.45} \end{equation}\] These yield the \(MM\) estimators \(\hat{\alpha}_{T}\), \(\hat{\phi}_{T}\), and \(\hat{\sigma}_{\mathbf{W},T}^{2}\) for the true values of the parameters of the process \(\alpha\), \(\phi\), and \(\sigma_{\mathbf{W}}^{2}\). Hence, in terms of the time series \(\mathbf{x}\), the estimates \(\hat{\alpha}_{T}\left(\omega\right)\), \(\hat{\phi}_{T}\left(\omega\right)\), and \(\hat{\sigma}_{\mathbf{W},T}^{2}\left(\omega\right)\) are easily computed by setting \[\begin{equation} \hat{\phi}_{T}\left(\omega\right)=\rho_{\mathbf{X},T}\left(1\right), \qquad \hat{\alpha}_{T}\left(\omega\right)=\left(1-\hat{\phi}_{T}\left(\omega\right)\right) \bar{X}_{T}, \quad\text{and}\quad \hat{\sigma}^{2}_{\mathbf{W},T}\left(\omega\right)=\left(1-\hat{\phi}^{2}_{T}\left(\omega\right)\right) s^{2}_{\mathbf{X},{T}}. \end{equation}\]

12.1.2 Prediction of Future States and Prediction Intervals

Let \(\left(x_{t}\right)_{t=1}^{T}\equiv\mathbf{x}\) be a univariate real time series, for some \(T\geq2\), and let \(\left(X_{t}\right)_{t\in\mathbb{N}_{0}}\equiv\mathbf{X}\) be an ergodic \(AR\left(1\right)\) process which satisfies Equation (12.2), for some drift \(\alpha\in\mathbb{R}\), linear trend coefficient \(\beta=0\), regression coefficient \(\phi\in\left(-1,1\right)\), and state innovation process \(\left(W_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{W}\sim SWN\left(\sigma_{\mathbf{W}}^{2}\right)\) with \(\sigma_{\mathbf{W}}>0\). Assume to have determined the estimates \(\hat{\alpha}\left(\omega\right)\), \(\hat{\phi}\left(\omega\right)\), and \(\hat{\sigma}_{\mathbf{W}}\left(\omega\right)\) of the parameters \(\alpha\), \(\phi\), and \(\sigma_{\mathbf{W}}\), repsctively, which allow the best fit of the \(AR\left(1\right)\) process \(\mathbf{X}\) to the time series \(\mathbf{x}\) with some estimation method. Let \(\left(\mathcal{F}_{t}^{X_{0},\mathbf{W}}\right)_{t\in\mathbb{N}_{0}}\equiv\mathfrak{F}^{X_{0},\mathbf{W}}\) be the filtration generated by the initial state \(X_{0}\) of \(\mathbf{X}\) and the innovation process \(\mathbf{W}\). For any \(S,T\in \mathbb{N}\), write \(X_{T+S}\) for the \(S\)th future state of the process \(\mathbf{X}\) with respect to the current state \(X_{T}\) and write \(\hat{X}_{T+S\mid T}\) for the minimum square error predictor of the \(S\)th future state of the process \(\mathbf{X}\), given the information represented by \(\mathcal{F}_{T}^{X_{0},\mathbf{W}}\). Formally, \[\begin{equation} \hat{X}_{T+S\mid T}=\underset{Y\in L^{2}\left(\Omega_{\mathcal{F}_{T}^{X_{0},\mathbf{W}}};\mathbb{R}\right)} {\arg\min}\mathbf{E}\left[Y-X_{T+S}\right]^{2}, \tag{12.46} \end{equation}\] where \(L^{2}\left(\Omega_{\mathcal{F}_{T}^{X_{0},W}},\mathbb{R}\right)\) is the Hilbert space of the random variables which are measurable with respect to the \(\sigma\)-algebra \(\mathcal{F}_{T}^{X_{0},\mathbf{W}}\) and have finite moment of order \(2\). As a consequence of Equation (12.46), we have \[\begin{equation} \hat{X}_{T+S\mid T}=\mathbf{E}\left[X_{T+S}\mid\mathcal{F}_{T}^{X_{0},W}\right]. \tag{12.47} \end{equation}\] It is also well known that \[\begin{equation} \hat{X}_{T+S\mid T}=h\left(X_{0},W_{1},\dots,W_{T}\right), \tag{12.48} \end{equation}\] where \(h:\mathbb{R}^{T+1}\rightarrow\mathbb{R}\) is a function such that \[\begin{equation} h\left(\cdot,\cdot,\dots,\cdot\right)= \underset{g:\mathbb{R}^{T+1}\rightarrow\mathbb{R}\text{ s.t. } g\left(X_{0},W_{1},\dots,W{T}\right)\in L^{2}\left(\Omega_{\mathcal{F}_{T}^{X_{0},\mathbf{W}}};\mathbb{R}\right)} {\arg\min}\mathbf{E}\left[g\left(X_{0},W_{1},\dots,W_{T}\right)-X_{T+S}\right]^{2}. \tag{12.49} \end{equation}\] Moreover, two functions \(h_{1}\) and \(h_{2}\) solutions of the maximization problem (12.48) can differ only on a subset of \(\mathbb{R}^{T+1}\) with zero Lebesue measure.

We recall some important results which depend only on the properties of the conditional expectation operator on the Hilbert space of the random variables with finite \(2\)nd moment and are independent of the \(AR(1)\) structure of the process \(\mathbf{X}\).

We have \[\begin{equation} \mathbf{E}\left[\hat{X}_{T+S\mid T}\right]=\mathbf{E}\left[X_{T+S}\right] \tag{12.50} \end{equation}\] and \[\begin{equation} \mathbf{E}\left[\left(X_{T+S}-\hat{X}_{T+S\mid T}\right)\hat{X}_{T+S\mid T}\right]=0, \tag{12.51} \end{equation}\] for all \(S,T\in\mathbb{N}\).

Corollary 12.2 (Conditional expectation properties) We have \[\begin{equation} \mathbf{E}\left[\hat{X}_{T+S\mid T}^{2}\right]=\mathbf{E}\left[X_{T+S}\hat{X}_{T+S\mid T}\right] \tag{12.52} \end{equation}\] and \[\begin{equation} \mathbf{D}^{2}\left[\hat{X}_{T+S\mid T}\right]=Cov\left(X_{T+S},\hat{X}_{T+S\mid T}\right), \tag{12.53} \end{equation}\] for all \(S,T\in\mathbb{N}\).

Proof. Thanks to Equation (12.51) we have \[\begin{equation} 0=\mathbf{E}\left[\left(X_{T+S}-\hat{X}_{T+S\mid T}\right)X_{T+S\mid T}\right] =\mathbf{E}\left[X_{T+S}X_{T+S\mid T}-\hat{X}_{T+S\mid T}^{2}\right] =\mathbf{E}\left[X_{T+S}\hat{X}_{T+S\mid T}\right]-\mathbf{E}\left[\hat{X}_{T+S\mid T}^{2}\right], \end{equation}\] for all \(S,T\in\mathbb{N}\). This is Equation (12.52). Hence, considering Equation (12.50), we can write \[\begin{align} Cov\left(X_{T+S},\hat{X}_{T+S\mid T}\right) & =\mathbf{E}\left[X_{T+S}\hat{X}_{T+S\mid T}\right] -\mathbf{E}\left[X_{T+S}\right]\mathbf{E}\left[\hat{X}_{T+S\mid T}\right] \\ & =\mathbf{E}\left[\hat{X}_{T+S\mid T}^{2}\right]-\mathbf{E}\left[\hat{X}_{T+S\mid T}\right] \mathbf{E}\left[\hat{X}_{T+S\mid T}\right] \\ & =\mathbf{D}^{2}\left[\hat{X}_{T+S\mid T}\right], \end{align}\] for all \(S,T\in\mathbb{N}\), which is Equation (12.53).

Definition 12.6 (Prediction error) We call prediction error of the predictor \(\hat{X}_{T+S\mid T}\) of the state \(X_{T+S}\) the random variable \[\begin{equation} E_{T+S\mid T}\overset{\text{def}}{=}X_{T+S}-\hat{X}_{T+S\mid T}, \quad\forall T,S\in\mathbb{N}. \tag{12.54} \end{equation}\] We call mean square error of the predictor \(\hat{X}_{T+S\mid T}\) of the state \(X_{T+S}\), the positive number \[\begin{equation} \mathbf{MSE}\left[E_{T+S\mid T}\right]\overset{\text{def}}{=} \mathbf{E}\left[E_{T+S\mid T}^{2}\right], \quad\forall T,S\in\mathbb{N}. \tag{12.55} \end{equation}\]

We have \[\begin{equation} \mathbf{E}\left[E_{T+S\mid T}\right]=\mathbf{E}\left[X_{T+S}\right]-\mathbf{E}\left[\hat{X}_{T+S\mid T}\right]=0. \tag{12.56} \end{equation}\] Hence, \[\begin{equation} \mathbf{MSE}\left[E_{T+S\mid T}\right]=\mathbf{D}^{2}\left[E_{T+S\mid T}\right] \tag{12.57} \end{equation}\]

Proposition 12.17 (Prediction error variance) We have \[\begin{equation} \mathbf{D}^{2}\left[E_{T+S\mid T}\right] =\mathbf{D}^{2}\left[X_{T+S}\right]-\mathbf{D}^{2}\left[\hat{X}_{T+S\mid T}\right] \tag{12.58} \end{equation}\] for all \(S,T\in\mathbb{N}\).

Proof. Considering Equations (12.56), (12.50), and (12.51), a straightforward computation gives \[\begin{align} \mathbf{D}^{2}\left[E_{T+S\mid T}\right]&=\mathbf{E}\left[E_{T+S\mid T}^{2}\right] \\ & =\mathbf{E}\left[\left(X_{T+S}-\hat{X}_{T+S\mid T}\right)^{2}\right] \\ & =\mathbf{E}\left[X_{T+S}^{2}+\hat{X}_{T+S\mid T}^{2}-2X_{T+S}\hat{X}_{T+S\mid T}\right] \\ & =\mathbf{E}\left[X_{T+S}^{2}\right]-\mathbf{E}\left[X_{T+S}\hat{X}_{T+S\mid T}-\hat{X}_{T+S\mid T}^{2}\right] -\mathbf{E}\left[X_{T+S}\hat{X}_{T+S\mid T}\right] \\ & =\mathbf{E}\left[X_{T+S}^{2}\right]-\mathbf{E}\left[X_{T+S}\right]^{2}+\mathbf{E}\left[X_{T+S}\right]^{2} -\mathbf{E}\left[\left(X_{T+S}-\hat{X}_{T+S\mid T}\right)\hat{X}_{T+S\mid T}\right] -\mathbf{E}\left[X_{T+S}\hat{X}_{T+S\mid T}\right] \\ & =\mathbf{D}^{2}\left[X_{T+S}\right]+\mathbf{E}\left[X_{T+S}\right] \mathbf{E}\left[X_{T+S}\right]-\mathbf{E}\left[X_{T+S}\hat{X}_{T+S\mid T}\right]\\ & =\mathbf{D}^{2}\left[X_{T+S}\right]+\mathbf{E}\left[X_{T+S}\right] \mathbf{E}\left[\hat{X}_{T+S\mid T}\right]-\mathbf{E}\left[X_{T+S}\hat{X}_{T+S\mid T}\right]\\ & =\mathbf{D}^{2}\left[X_{T+S}\right]-Cov\left(X_{T+S},\hat{X}_{T+S\mid T}\right)\\ & =\mathbf{D}^{2}\left[X_{T+S}\right]-\mathbf{D}^{2}\left[\hat{X}_{T+S\mid T}\right], \end{align}\] for all \(S,T\in\mathbb{N}\), as desired.

Now, we turn again our attention to \(AR\left(1\right)\) processes.

Proposition 12.18 (AR(1) process predictor) We have \[\begin{equation} \hat{X}_{T+S\mid T}=\phi^{S}X_{T}+\alpha\frac{1-\phi^{S}}{1-\phi}. \tag{12.59} \end{equation}\] for all \(S,T\in\mathbb{N}\). As a consequence, \[\begin{equation} \mathbf{D}^{2}\left[\hat{X}_{T+S\mid T}\right]=Cov\left(X_{T+S},\hat{X}_{T+S\mid T}\right) =\phi^{2k}\mathbf{D}^{2}\left[X_{T}\right]=\frac{\phi^{2S}}{1-\phi^{2}}\sigma_{\mathbf{W}}^{2}, \tag{12.60} \end{equation}\] for all \(S,T\in\mathbb{N}\).

Proof. Considering Equations (12.11), we can write \[\begin{equation} X_{T+S}=\phi^{S}X_{T}+\alpha\frac{1-\phi^{S}}{1-\phi} +\sum_{t=T+1}^{T+S}\phi^{T+S-t}W_{t}. \end{equation}\] for all \(S,T\in\mathbb{N}\). Therefore, by virtue of the properties of the conditional expectation and considering that \(W\sim SWN\left(\sigma_{\mathbf{W}}^{2}\right)\), we obtain \[\begin{align} \hat{X}_{T+S\mid T} & =\mathbf{E}\left[\phi^{S}X_{T}+\alpha\frac{1-\phi^{S}}{1-\phi} +\sum_{t=T+1}^{T+S}\phi^{T+S-t}W_{t}\mid\mathcal{F}_{T}^{X_{0},W}\right] \\ & =\phi^{S}\mathbf{E}\left[X_{T}\mid\mathcal{F}_{T}^{X_{0},W}\right] +\alpha\frac{1-\phi^{S}}{1-\phi} +\sum_{t=T+1}^{T+S}\phi^{T+S-t}\mathbf{E}\left[W_{t}\mid\mathcal{F}_{T}^{X_{0},W}\right] \\ & =\phi^{S}X_{T}+\alpha\frac{1-\phi^{S}}{1-\phi} +\sum_{t=T+1}^{T+S}\phi^{T+S-t}\mathbf{E}\left[W_{t}\right] \\ & =\phi^{S}X_{T}+\alpha\frac{1-\phi^{S}}{1-\phi}. \end{align}\] It follows \[\begin{equation} \hat{X}_{T+S\mid T}-\mathbf{E}\left[\hat{X}_{T+S\mid T}\right] =\phi^{S}\left(X_{T}-\mathbf{E}\left[X_{T}\right]\right). \end{equation}\] Hence, \[\begin{equation} \mathbf{D}^{2}\left[\hat{X}_{T+S\mid T}\right] =\mathbf{E}\left[\left(\hat{X}_{T+S\mid T}-\mathbf{E}\left[\hat{X}_{T+S\mid T}\right]\right)^{2}\right] =\mathbf{E}\left[\phi^{2S}\left(X_{T}-\mathbf{E}\left[X_{T}\right]\right)^{2}\right] =\phi^{2k}\mathbf{D}^{2}\left[X_{T}\right]. \end{equation}\] for all \(S,T\in\mathbb{N}\). This yields the desired (12.59).

Proposition 12.19 (AR(1) processes prediction error variance) We have \[\begin{equation} \mathbf{D}^{2}\left[E_{T+S\mid T}\right]=\frac{1-\phi^{2S}}{1-\phi^{2}}\sigma_{\mathbf{W}}^{2}, \tag{12.61} \end{equation}\] for all \(S,T\in\mathbb{N}\).

Proof. Combining Equations (12.37), (12.58), and (12.60), the desired result immediately follows.

Proposition 12.20 (AR(1) prediction intervals) Assume that \(X_{0}\) is Gaussian and \(\mathbf{W}\) is a Gaussian white noise, \(\mathbf{W}\sim GWN\left(\sigma_{\mathbf{W}}^{2}\right)\), for some \(\sigma_{\mathbf{W}}>0\). Then, also the predictor \(\hat{X}_{T+S\mid T}\) of the state \(X_{T+S}\) is Gaussian for all \(S,T\in\mathbb{N}\). Therefore, a prediction interval for the state \(X_{T+S}\), for any \(S\in\mathbb{N}\), at the confidence level of \(100\left(1-\alpha\right)\%\), for any \(\alpha\in\left(0,1\right)\), is given by \[ \left(\hat{X}_{T+S\mid T}-z_{\alpha/2}\sqrt{\frac{\phi^{2S}}{1-\phi^{2}}\sigma_{\mathbf{W}}^{2}},\ \hat{X}_{T+S\mid T}+z_{\alpha/2}\sqrt{\frac{\phi^{2S}}{1-\phi^{2}}\sigma_{\mathbf{W}}^{2}}\right), \tag{12.62} \] where \(\hat{X}_{T+S\mid T}\) is given by (12.59) and \(z_{\alpha/2}\) is the upper tail critical value of level \(\alpha/2\) of the standard Gaussian random variable. The realization of the predicton interval is then given by \[ \left(\hat{x}_{T+S\mid T}-z_{\alpha/2} \sqrt{\frac{\hat{\phi}\left(\omega\right)^{2S}} {1-\hat{\phi}\left(\omega\right)^{2}}\hat{\sigma}_{\mathbf{W}}^{2}\left(\omega\right)},\ \hat{x}_{T+S\mid T}+z_{\alpha/2}\sqrt{\frac{\hat{\phi}\left(\omega\right)^{2S}} {1-\hat{\phi}\left(\omega\right)^{2}}\hat{\sigma}_{\mathbf{W}}^{2}\left(\omega\right)}\right), \tag{12.63} \] where \(\hat{x}_{T+S\mid T}\) is the realization of the estimator \(\hat{X}_{T+S\mid T}\) of the state \(X_{T+S}\) and \(\hat{\phi}\left(\omega\right)\) [resp. \(\hat{\sigma}_{\mathbf{W}}\left(\omega\right)\)] is the estimated value of the parameter \(\phi\) [resp. \(\sigma_{\mathbf{W}}^{2}\)].

12.1.3 Examples

We build an \(AR(1)\) process and consider some sample paths

t <- seq(from=-0.49, to=1.00, length.out=150)     # Choosing the time set.
a <- 0.5                                          # Choosing the drift coefficient. 
b <- 5.0                                          # Choosing the linear trend coefficient.
f <- 0.3                                          # Choosing the regression coefficient.
set.seed(12345, kind=NULL, normal.kind=NULL)      # Setting a random seed for reproducibility.
Gauss_r <- rnorm(n=150, mean=0, sd=9)             # Determining one of the possible values of the Gaussian 
                                                  # random variables in the state innovation process. 

                                                  # Showing the values taken by the Gaussian random variables
                                                  # in the state innovation process. 
head(Gauss_r)                                     # Initial part of the sample path of the state innovation.
## [1]   5.2697594   6.3851942  -0.9837298  -4.0814746   5.4529871 -16.3616037
tail(Gauss_r)                                     # Final part of the sample path of the state innovation.
## [1]   0.1427012   4.8615261 -13.9256277   7.6468764   8.0641187   1.2482190
x_r <- rep(NA,150)                                # Setting an empty vector of length 150 to store 
                                                  # the sample path of the AR(1) process, corresponding to 
                                                  # the sample path of the state innovation.
x0 <- 0                                           # Choosing the starting point of the AR(1) process.
x_r[1] <- a + b*t[1] + f*x0 + Gauss_r[1]          # Determining the first point (after the starting point) 
                                                  # of the sample path of the AR(1) process. 
for (n in 2:150)
{x_r[n] <- a + b*t[n] + f*x_r[n-1] + Gauss_r[n]}  # Determining the other points of the sample path 
                                                  # of the AR(1) process.
head(x_r)                                         # Initial part of the sample path of the AR(1) process.
## [1]   3.319759   5.481122  -1.189393  -6.238293   1.831499 -17.512154
tail(x_r)                                         # Final part of the sample path of the AR(1) process.
## [1]  2.672953 10.963412 -5.286604 11.460895 16.952387 11.833935
set.seed(23451, kind=NULL, normal.kind=NULL)      # Setting another random seed for reproducibility 
                                                  # and building another sample path of the AR(1) process.

Gauss_b <- replace(Gauss_r, c(51:150), rnorm(n=100, mean=0, sd=9)) # Building another sample path of the  
                                                                   # Gaussian state innovation process,  
                                                                   # which retains the first 50 sample points 
                                                                   # of the former path.
                                                  # Showing the values taken by the Gaussian random variables
                                                  # in the state innovation process. 
head(Gauss_b)                                     # Initial part of the sample path of the state innovation.
## [1]   5.2697594   6.3851942  -0.9837298  -4.0814746   5.4529871 -16.3616037
tail(Gauss_b)                                     # Final part of the sample path of the state innovation.
## [1] -4.807130  5.182616 -4.460372  1.885254  4.087216 -1.627632
x_b <- rep(NA,150)                                # Setting an empty vector of length 150 to store 
                                                  # the sample path of the AR(1) process, corresponding to 
                                                  # the sample path of the state innovation.
x0 <- 0                                           # Choosing the starting point of the AR(1) process.

x_b[1] <- a + b*t[1] + f*x0 + Gauss_b[1]          # Determining the first point (after the starting point) 
                                                  # of the sample path of the AR(1) process. 
for (n in 2:150)
{x_b[n] <- a + b*t[n] + f*x_b[n-1] + Gauss_b[n]}  # Determining the other points of the sample path 
                                                  # of the AR(1) process.
head(x_b)                                         # Initial part of the sample path of the AR(1) process.
## [1]   3.319759   5.481122  -1.189393  -6.238293   1.831499 -17.512154
tail(x_b)                                         # Final part of the sample path of the AR(1) process.
## [1] -0.2367807 10.4115815  4.0131024  8.4891843 12.0839709  7.4975591
set.seed(34512, kind=NULL, normal.kind=NULL)
Gauss_g <- replace(Gauss_r, c(51:150), rnorm(n=100, mean=0, sd=9))

head(Gauss_g)                                     # Initial part of the sample path of the AR(1) process
## [1]   5.2697594   6.3851942  -0.9837298  -4.0814746   5.4529871 -16.3616037
tail(Gauss_g)                                     # Final part of the sample path of the AR(1) process
## [1] -8.75065929  6.86117068  1.45432151 -0.06290163 -2.54515536 -7.59723922
x_g <- rep(NA,150)                                # Setting an empty vector of length 150 to store 
                                                  # the sample path of the AR(1) process, corresponding to 
                                                  # the sample path of the state innovation.
x0 <- 0                                           # Choosing the starting point of the AR(1) process.
x_g[1] <- a + b*t[1] + f*x0 + Gauss_g[1]          # Determining the first point (after the starting point) 
                                                  # of the sample path of the AR(1) process. 
for (n in 2:150)
{x_g[n] <- a + b*t[n] + f*x_g[n-1] + Gauss_g[n]}  # Determining the other points of the sample path 
                                                  # of the AR(1) process.
head(x_g)                                         # Initial part of the sample path of the AR(1) process.
## [1]   3.319759   5.481122  -1.189393  -6.238293   1.831499 -17.512154
tail(x_g)                                         # Final part of the sample path of the AR(1) process.
## [1]  1.7514180 12.6865961 10.6103003  8.5201885  5.4609012 -0.4589689
Gauss_AR1_df <- data.frame(t,x_r,x_b,x_g)         # Generating a data frame from the time variable 
                                                  # and the three paths of the AR(1) process.
head(Gauss_AR1_df)
##       t        x_r        x_b        x_g
## 1 -0.49   3.319759   3.319759   3.319759
## 2 -0.48   5.481122   5.481122   5.481122
## 3 -0.47  -1.189393  -1.189393  -1.189393
## 4 -0.46  -6.238293  -6.238293  -6.238293
## 5 -0.45   1.831499   1.831499   1.831499
## 6 -0.44 -17.512154 -17.512154 -17.512154
# library(dplyr)
Gauss_AR1_df <- add_row(Gauss_AR1_df,  t=-0.50, x_r=0, x_b=0, x_g=0, .before=1) # Adding a row to represent  
                                                                                # the starting point of the AR(1)                                                                                  # process.
head(Gauss_AR1_df)
##       t       x_r       x_b       x_g
## 1 -0.50  0.000000  0.000000  0.000000
## 2 -0.49  3.319759  3.319759  3.319759
## 3 -0.48  5.481122  5.481122  5.481122
## 4 -0.47 -1.189393 -1.189393 -1.189393
## 5 -0.46 -6.238293 -6.238293 -6.238293
## 6 -0.45  1.831499  1.831499  1.831499
tail(Gauss_AR1_df)
##        t       x_r        x_b        x_g
## 146 0.95  2.672953 -0.2367807  1.7514180
## 147 0.96 10.963412 10.4115815 12.6865961
## 148 0.97 -5.286604  4.0131024 10.6103003
## 149 0.98 11.460895  8.4891843  8.5201885
## 150 0.99 16.952387 12.0839709  5.4609012
## 151 1.00 11.833935  7.4975591 -0.4589689

We plot the paths of the \(AR(1)\) process. First, the scatter Plot

# library(ggplot2)
Data_df <- Gauss_AR1_df
First_Date <- as.character(Data_df$t[1])
Last_Date <- as.character(Data_df$t[nrow(Data_df)])
# title_content_1 <- "University of Roma Tor Vergata - MPSMF 2021-2022
# title_content_2a <- "Line Plot of Three Paths of a Gaussian Random Walk with Drift and Linear Trend,"
# title_content_2b <- paste("for t=", First_Date, " to t=", Last_Date)
# title_content_2 <- paste(title_content_2a,title_content_2b, sep="")
# title_content <- paste(title_content_1, \ntitle_content_2)
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Scatter Plot of Three Paths of a Gaussian AR(1) Process with Drift and Linear Trend for t = ", .(First_Date), " to t = ", .(Last_Date)))
subtitle_content <- bquote(atop(paste("path length ", .(nrow(Data_df)), " sample points,    starting point ",
                                      x[0]==0, ",    drift par. ", alpha==.(a), ",  linear trend par. ", 
                                      beta==.(b), ",  regression par. ", phi==.(f),","),
                                paste("state innovation random seeds ", 12345, ", " , 23451, ", " , 34512, 
                                      ",    state innovation var. par. ", sigma^2==1,".")))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
y_name <- bquote(~ X[t] ~ "values")
x_breaks_num <- 15
x_binwidth <- round((max(Data_df$t)-min(Data_df$t))/x_breaks_num, digits=3)
x_breaks_low <- floor((min(Data_df$t)/x_binwidth))*x_binwidth
x_breaks_up <- ceiling((max(Data_df$t)/x_binwidth))*x_binwidth
x_breaks <- round(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth),3)
x_labs <- format(x_breaks, scientific=FALSE)
j <- 0
x_lims <- c(x_breaks_low-j*x_binwidth, x_breaks_up+j*x_binwidth)
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$x_r,Data_df$x_b,Data_df$x_g)-min(Data_df$x_r,Data_df$x_b,Data_df$x_g))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$x_r,Data_df$x_b,Data_df$x_g)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$x_r,Data_df$x_b,Data_df$x_g)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
x_k_col <- bquote("random seed" ~  12345)
x_r_col <- bquote("random seed" ~  12345)
x_b_col <- bquote("random seed" ~  23451)
x_g_col <- bquote("random seed" ~  34512)
leg_labs <- c(x_k_col, x_r_col, x_b_col, x_g_col)
leg_ord <- c("x_k_col", "x_r_col", "x_b_col", "x_g_col")
leg_cols <- c("x_k_col"="black", "x_r_col"="red", "x_b_col"="blue", "x_g_col"="green")
Data_df_SP <- ggplot(Data_df, aes(x=t)) + 
  geom_hline(yintercept = 0, size=0.3, colour="black") +
  geom_vline(xintercept = 0, size=0.3, colour="black") +
  geom_point(alpha=1, size=1, aes(y=x_r, color="x_r_col")) +
  geom_point(alpha=1, size=1, aes(y=x_b, color="x_b_col")) +
  geom_point(alpha=1, size=1, aes(y=x_g, color="x_g_col")) +
  geom_point(data=subset(Data_df, Data_df$t <= 0), alpha=1, size=1, aes(y=x_r, color="x_k_col")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_ord) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(Data_df_SP)

To save the plot in a png or pdf file

# file_name = paste("Scatter Plot of Three Paths of a Gaussian AR(1) process with Drift and Linear Trend - SP.png", sep="")
# png(file_name, width=1600, height=800, res=120)
# print(Data_df_SP)
# dev.off()
# 
# file_name = paste("Scatter Plot of Three Paths of a Gaussian AR(1) process with Drift and Linear Trend - SP.pdf", sep="")
# pdf(file_name, width=14, height=8)
# print(Data_df_SP)
# dev.off()

Second, the line plot

# library(ggplot2)
Data_df <- Gauss_AR1_df
First_Date <- as.character(Data_df$t[1])
Last_Date <- as.character(Data_df$t[nrow(Data_df)])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Line Plot of Three Paths of a Gaussian AR(1) process with Drift and Linear Trend for t = ", .(First_Date), " to t = ", .(Last_Date)))
subtitle_content <- bquote(atop(paste("path length ", .(nrow(Data_df)), " sample points,    starting point ",
                                      x[0]==0, ",    drift par. ", alpha==.(a), ",  linear trend par. ", 
                                      beta==.(b), ",  regression par. ", phi==.(f),","),
                                paste("state innovation random seeds ", 12345, ", " , 23451, ", " , 34512, 
                                      ",    state innovation var. par. ", sigma^2==1,".")))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
y_name <- bquote(~ X[t] ~ "values")
x_breaks_num <- 15
x_binwidth <- round((max(Data_df$t)-min(Data_df$t))/x_breaks_num, digits=3)
x_breaks_low <- floor((min(Data_df$t)/x_binwidth))*x_binwidth
x_breaks_up <- ceiling((max(Data_df$t)/x_binwidth))*x_binwidth
x_breaks <- round(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth),3)
x_labs <- format(x_breaks, scientific=FALSE)
j <- 0
x_lims <- c(x_breaks_low-j*x_binwidth, x_breaks_up+j*x_binwidth)
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$x_r,Data_df$x_b,Data_df$x_g)-min(Data_df$x_r,Data_df$x_b,Data_df$x_g))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$x_r,Data_df$x_b,Data_df$x_g)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$x_r,Data_df$x_b,Data_df$x_g)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <-  0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
x_k_col <- bquote("random seed" ~  12345)
x_r_col <- bquote("random seed" ~  12345)
x_b_col <- bquote("random seed" ~  23451)
x_g_col <- bquote("random seed" ~  34512)
leg_labs <- c(x_k_col, x_r_col, x_b_col, x_g_col)
leg_ord <- c("x_k_col", "x_r_col", "x_b_col", "x_g_col")
leg_cols <- c("x_k_col"="black", "x_r_col"="red", "x_b_col"="blue", "x_g_col"="green")
Data_df_LP <- ggplot(Data_df, aes(x=t)) + 
  geom_hline(yintercept = 0, size=0.3, colour="black") +
  geom_vline(xintercept = 0, size=0.3, colour="black") +
  geom_line(alpha=1, size=0.6, aes(y=x_b, color="x_b_col"), group=1) +
  geom_line(alpha=1, size=0.6, aes(y=x_g, color="x_g_col"), group=1) +
  geom_line(alpha=1, size=0.6, aes(y=x_r, color="x_r_col"), group=1) +
  geom_line(data=subset(Data_df, Data_df$t <= 0), alpha=1, size=1, aes(y=x_r, color="x_k_col")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_ord) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(Data_df_LP)

From the visual inspection of both the scatter and line plot, the three paths of the AR(1) process show a slight evidence for a trend, but no evidence for seasonality. Moreover, there is no visual evidence of heteroskedasticity.

We concentrate on the analysis of the black-red path, characterized by random seed 12345.

The scatter plot

# library(ggplot2)
Data_df <- Gauss_AR1_df
First_Date <- as.character(Data_df$t[1])
Last_Date <- as.character(Data_df$t[nrow(Data_df)])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Scatter Plot of the Black-Red Path of a Gaussian AR(1) Process with Drift and Linear Trend for t = ", .(First_Date), " to t = ", .(Last_Date)))
subtitle_content <- bquote(atop(paste("path length ", .(nrow(Data_df)), " sample points,    starting point ",
                                      x[0]==0, ",    drift par. ", alpha==.(a), ",  linear trend par. ", 
                                      beta==.(b), ",  regression par. ", phi==.(f),","),
                                paste("state innovation random seeds", 12345, 
                                      ",    state innovation var. par. ", sigma^2==1,".")))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
y_name <- bquote(~ X[t] ~ "values")
x_breaks_num <- 15
x_binwidth <- round((max(Data_df$t)-min(Data_df$t))/x_breaks_num, digits=3)
x_breaks_low <- floor((min(Data_df$t)/x_binwidth))*x_binwidth
x_breaks_up <- ceiling((max(Data_df$t)/x_binwidth))*x_binwidth
x_breaks <- round(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth),3)
x_labs <- format(x_breaks, scientific=FALSE)
j <- 0
x_lims <- c(x_breaks_low-j*x_binwidth, x_breaks_up+j*x_binwidth)
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$x_r)-min(Data_df$x_r))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$x_r)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$x_r)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <-  0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
x_k_col <- bquote("random seed" ~  12345)
x_r_col <- bquote("random seed" ~  12345)
x_rgrln <- bquote("Regression Line")
x_loess <- bquote("LOESS Curve")
leg_labs <- c(x_k_col, x_r_col, x_rgrln, x_loess)
leg_ord  <- c("x_k_col", "x_r_col", "x_rgrln", "x_loess")
leg_cols <- c("x_k_col"="black", "x_r_col"="red", "x_rgrln"="blue", "x_loess"="green")
Data_df_SP <- ggplot(Data_df, aes(x=t)) + 
  geom_hline(yintercept = 0, size=0.3, colour="black") +
  geom_vline(xintercept = 0, size=0.3, colour="black") +
  geom_smooth(alpha=1, size = 0.5, linetype="solid", aes(x=t, y=x_r, color="x_rgrln"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=x_r, color="x_loess"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_point(alpha=1, size=1, aes(y=x_r, color="x_r_col")) +
  geom_point(data=subset(Data_df, Data_df$t <= 0), alpha=1, size=1, aes(y=x_r, color="x_k_col")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
 scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_ord,
                     guide=guide_legend(override.aes=list(shape=c(NA,NA,NA,NA),
                     linetype=c("dotted", "dotted", "solid", "dashed")))) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(Data_df_SP)

The line plot.

# library(ggplot2)
Data_df <- Gauss_AR1_df
First_Date <- as.character(Data_df$t[1])
Last_Date <- as.character(Data_df$t[nrow(Data_df)])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Line Plot of the Black-Red Path of a Gaussian AR(1) Process with Drift and Linear Trend for t = ", .(First_Date), " to t = ", .(Last_Date)))
subtitle_content <- bquote(atop(paste("path length ", .(nrow(Data_df)), " sample points,    starting point ",
                                      x[0]==0, ",    drift par. ", alpha==.(a), ",  linear trend par. ", 
                                      beta==.(b), ",  regression par. ", phi==.(f),","),
                                paste("state innovation random seeds ", 12345, 
                                      ",    state innovation var. par. ", sigma^2==1,".")))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
y_name <- bquote(~ X[t] ~ "values")
x_breaks_num <- 15
x_binwidth <- round((max(Data_df$t)-min(Data_df$t))/x_breaks_num, digits=3)
x_breaks_low <- floor((min(Data_df$t)/x_binwidth))*x_binwidth
x_breaks_up <- ceiling((max(Data_df$t)/x_binwidth))*x_binwidth
x_breaks <- round(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth),3)
x_labs <- format(x_breaks, scientific=FALSE)
j <- 0
x_lims <- c(x_breaks_low-j*x_binwidth, x_breaks_up+j*x_binwidth)
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$x_r)-min(Data_df$x_r))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$x_r)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$x_r)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <-  0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
x_rgrln <- bquote("Regression Line")
x_loess <- bquote("LOESS Curve")
leg_labs <- c(x_k_col, x_r_col, x_rgrln, x_loess)
leg_ord  <- c("x_k_col", "x_r_col", "x_rgrln", "x_loess")
leg_cols <- c("x_k_col"="black", "x_r_col"="red", "x_rgrln"="blue", "x_loess"="green")
Data_df_LP <- ggplot(Data_df, aes(x=t)) + 
  geom_hline(yintercept = 0, size=0.3, colour="black") +
  geom_vline(xintercept = 0, size=0.3, colour="black") +
  geom_smooth(alpha=1, size = 0.5, linetype="solid", aes(x=t, y=x_r, color="x_rgrln"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=x_r, color="x_loess"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_line(alpha=1, size=0.6, aes(y=x_r, color="x_r_col"), group=1) +
  geom_line(data=subset(Data_df, Data_df$t <= 0), alpha=1, size=1, aes(y=x_r, color="x_k_col")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  #  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_ord) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_ord,
                     guide=guide_legend(override.aes=list(linetype=c("solid", "solid", "solid", "dashed")))) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(Data_df_LP)

In terms of a visual inspection for evidences of trend and seasonality, it is also useful to consider the corellograms of the time series.

Plot of the autocorrelogram.

y <- Gauss_AR1_df$x_r
length <- length(y)
maxlag <- ceiling(10*log10(length))
Aut_Fun_y <- acf(y, lag.max = maxlag, type="correlation", plot=FALSE)
ci_90 <- qnorm((1+0.90)/2)/sqrt(length)
ci_95 <- qnorm((1+0.95)/2)/sqrt(length)
ci_99 <- qnorm((1+0.99)/2)/sqrt(length)
Plot_Aut_Fun_y <- data.frame(lag=Aut_Fun_y$lag, acf=Aut_Fun_y$acf)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Plot of the Autocorrelogram of the Black-Red Path of a Gaussian AR(1) Process with Drift and Linear Trend for t = ", .(First_Date), " to t = ", .(Last_Date)))
subtitle_content <- bquote(paste("path length ", .(length), " sample points,   ", "lags ", .(maxlag)))
caption_content <- "Author: Roberto Monte"
ggplot(Plot_Aut_Fun_y, aes(x=lag, y=acf)) + 
  geom_segment(aes(x=lag, y=rep(0,length(lag)), xend=lag, yend=acf), size = 1, col="black") +
  # geom_col(mapping=NULL, data=NULL, position="dodge", width = 0.1, col="black", inherit.aes = TRUE)+
  geom_hline(aes(yintercept=-ci_90, color="CI_90"), show.legend = TRUE, lty=3) +
  geom_hline(aes(yintercept=ci_90, color="CI_90"), lty=3) +
  geom_hline(aes(yintercept=ci_95, color="CI_95"), show.legend = TRUE, lty=4) + 
  geom_hline(aes(yintercept=-ci_95, color="CI_95"), lty=4) +
  geom_hline(aes(yintercept=-ci_99, color="CI_99"), show.legend = TRUE, lty=4) +
  geom_hline(aes(yintercept=ci_99, color="CI_99"), lty=4) +
  scale_x_continuous(name="lag", breaks=waiver(), label=waiver()) +
  scale_y_continuous(name="acf value", breaks=waiver(), labels=NULL,
                     sec.axis = sec_axis(~., breaks=waiver(), labels=waiver())) +
  scale_color_manual(name="Conf. Inter.", labels=c("90%","95%","99%"),
                     values=c(CI_90="red", CI_95="blue", CI_99="green")) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")

Plot of the partial autocorrelogram.

y <- Gauss_AR1_df$x_r
length <- length(y)
maxlag <- ceiling(10*log10(length))
P_Aut_Fun_y <- pacf(y, lag.max = maxlag, plot=FALSE)
ci_90 <- qnorm((1+0.90)/2)/sqrt(length)
ci_95 <- qnorm((1+0.95)/2)/sqrt(length)
ci_99 <- qnorm((1+0.99)/2)/sqrt(length)
Plot_P_Aut_Fun_y <- data.frame(lag=P_Aut_Fun_y$lag, pacf=P_Aut_Fun_y$acf)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Plot of the Partial Autocorrelogram of the Black-Red Path of a Gaussian AR(1) Process with Drift and Linear Trend for t = ", .(First_Date), " to t = ", .(Last_Date)))
subtitle_content <- bquote(paste("path length ", .(length), " sample points,   ", "lags ", .(maxlag)))
caption_content <- "Author: Roberto Monte"
ggplot(Plot_P_Aut_Fun_y, aes(x=lag, y=pacf)) + 
  geom_segment(aes(x=lag, y=rep(0,length(lag)), xend=lag, yend=pacf), size = 1, col="black") +
  # geom_col(mapping=NULL, data=NULL, position="dodge", width = 0.1, col="black", inherit.aes = TRUE)+
  geom_hline(aes(yintercept=-ci_90, color="CI_90"), show.legend = TRUE, lty=3) +
  geom_hline(aes(yintercept=ci_90, color="CI_90"), lty=3) +
  geom_hline(aes(yintercept=ci_95, color="CI_95"), show.legend = TRUE, lty=4) + 
  geom_hline(aes(yintercept=-ci_95, color="CI_95"), lty=4) +
  geom_hline(aes(yintercept=-ci_99, color="CI_99"), show.legend = TRUE, lty=4) +
  geom_hline(aes(yintercept=ci_99, color="CI_99"), lty=4) +
  scale_x_continuous(name="lag", breaks=waiver(), label=waiver()) +
  scale_y_continuous(name="pacf value", breaks=waiver(), labels=NULL,
                     sec.axis = sec_axis(~., breaks=waiver(), labels=waiver())) +
  scale_color_manual(name="Conf. Inter.", labels=c("90%","95%","99%"),
                     values=c(CI_90="red", CI_95="blue", CI_99="green")) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")

From the plots we have no visual evidences for seasonality and no visual evidence of a strong trend component. In addition, from the partial autocorrelograms we have a visual evidence for autocorrelation of AR(1)-Ar(2) type.

We apply the Ljung-Box (LB) test

y <- Gauss_AR1_df$x_r
Box.test(y, lag = 1, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  y
## X-squared = 13.672, df = 1, p-value = 0.0002177

We have the rejection of the null hypothesis of no autocorrelation.

In light of the visual evidences from the scatter and line plots and from the autocorrelograms and the computational evidences from the LB test, we have an overall evidence that the process generating the time series might be an AR(1) or AR(2) process with linear trend.

Therefore, we apply the Augmented Dickey-Fuller (ADF) and the Kwiatowski-Phillips-Schmidt-Shin (KPSS) tests to test for the possible presence of a trend, by means of an autoregressive linear model.

The ADF test assumes that the time series is generated by a stochastic process with a random walk component. This null hypothesis leads to refer to the ADF test as a unit root test. In addition, three alternative hypotheses are considered:

  1. the time series is generated by an autoregressive process with no drift and no trend satisfying the equation \[\begin{equation} \Delta X_{t}=\phi_{1}X_{t-1}+\delta_{1}\Delta X_{t-1}+\cdots+\delta_{p-1}\Delta X_{t-p+1}+W_{t}, \quad\forall t\in\mathbb{N}, \tag{12.64} \end{equation}\] for some \(p\geq 1\), where \(\delta_{1},\cdots\delta_{p-1}\in\mathbb{R}\) and \(\delta_{0}\equiv 0\);

  2. the time series is generated by an autoregressive process with drift and no trend satisfying the equation \[\begin{equation} \Delta X_{t}=\alpha + \phi_{1}X_{t-1}+\delta_{1}\Delta X_{t-1}+\cdots+\delta_{l-1}\Delta X_{t-l+1}+W_{t}, \quad\forall t\in\mathbb{N}, \tag{12.65} \end{equation}\] for some \(l\geq 1\), where \(\delta_{1},\cdots\delta_{l-1}\in\mathbb{R}\) and \(\delta_{0}\equiv 0\);

  3. the time series is generated by an autoregressive process with drift and linear trend satisfying the equation \[\begin{equation} \Delta X_{t}=\alpha + \beta t + \phi_{1}X_{t-1}+\delta_{1}\Delta X_{t-1}+\cdots+\delta_{l-1}\Delta X_{t-l+1}+W_{t}, \quad\forall t\in\mathbb{N}, \tag{12.65} \end{equation}\] for some \(l\geq 1\), where \(\delta_{1},\cdots\delta_{l-1}\in\mathbb{R}\) and \(\delta_{0}\equiv 0\);

In the above equations the symbol \(\Delta\) represents the difference operator. Hence, we have \[\begin{equation} \Delta X_{t-\ell}\equiv X_{t-\ell}-X_{t-\ell-1}, \quad\forall \ell=0,\dots,l+1. \end{equation}\]

Considering the version of the ADF test in the library urca, we have several possibilities

  1. We test the unit root hypothesis against the alternative of autoregression with no drift and no trend by choosing a fixed number of lags \(l\) or selecting the number of lags by the Akaike information criterium or the Akaike information criterium.
# library(urca)
y <- Gauss_AR1_df$x_r
long_lags <- floor(12*(length(y)/100)^(1/4))
y_ADF_urca_none_long_lags <- ur.df(y, type="none", lags=long_lags, selectlags = "Fixed")
summary(y_ADF_urca_none_long_lags)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.680  -6.331   2.508   8.307  21.099 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## z.lag.1      -0.22693    0.14611  -1.553  0.12295   
## z.diff.lag1  -0.39325    0.16117  -2.440  0.01612 * 
## z.diff.lag2  -0.43891    0.16132  -2.721  0.00746 **
## z.diff.lag3  -0.37284    0.16267  -2.292  0.02361 * 
## z.diff.lag4  -0.34944    0.16132  -2.166  0.03223 * 
## z.diff.lag5  -0.18387    0.15783  -1.165  0.24629   
## z.diff.lag6  -0.33285    0.15179  -2.193  0.03020 * 
## z.diff.lag7  -0.10817    0.15159  -0.714  0.47681   
## z.diff.lag8  -0.29639    0.14298  -2.073  0.04026 * 
## z.diff.lag9  -0.28999    0.14060  -2.062  0.04127 * 
## z.diff.lag10 -0.19220    0.13319  -1.443  0.15154   
## z.diff.lag11 -0.12496    0.12188  -1.025  0.30724   
## z.diff.lag12 -0.11427    0.10635  -1.074  0.28473   
## z.diff.lag13  0.08467    0.09021   0.939  0.34978   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.51 on 123 degrees of freedom
## Multiple R-squared:  0.3955, Adjusted R-squared:  0.3267 
## F-statistic: 5.749 on 14 and 123 DF,  p-value: 1.673e-08
## 
## 
## Value of test-statistic is: -1.5532 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62
short_lags <- floor(4*(length(y)/100)^(1/4))
y_ADF_urca_none_short_lags <- ur.df(y, type="none", lags=short_lags, selectlags = "Fixed")
summary(y_ADF_urca_none_short_lags)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.143  -6.457   2.778   8.517  28.518 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## z.lag.1     -0.38869    0.11821  -3.288  0.00127 **
## z.diff.lag1 -0.25153    0.12013  -2.094  0.03807 * 
## z.diff.lag2 -0.24287    0.11135  -2.181  0.03084 * 
## z.diff.lag3 -0.17992    0.09927  -1.812  0.07204 . 
## z.diff.lag4 -0.15136    0.08441  -1.793  0.07508 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.65 on 141 degrees of freedom
## Multiple R-squared:  0.3241, Adjusted R-squared:  0.3002 
## F-statistic: 13.52 on 5 and 141 DF,  p-value: 8.92e-11
## 
## 
## Value of test-statistic is: -3.2882 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62
y_ADF_urca_none_AIC <- ur.df(y, type="none", lags=long_lags, selectlags = "AIC")
summary(y_ADF_urca_none_AIC)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.881  -5.710   3.858  10.356  28.079 
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -0.55532    0.09356  -5.935 2.34e-08 ***
## z.diff.lag -0.04949    0.08628  -0.574    0.567    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.85 on 135 degrees of freedom
## Multiple R-squared:  0.2926, Adjusted R-squared:  0.2822 
## F-statistic: 27.93 on 2 and 135 DF,  p-value: 7.087e-11
## 
## 
## Value of test-statistic is: -5.9352 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62
y_ADF_urca_none_BIC <- ur.df(y, type="none", lags=long_lags, selectlags = "BIC")
summary(y_ADF_urca_none_BIC)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.881  -5.710   3.858  10.356  28.079 
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -0.55532    0.09356  -5.935 2.34e-08 ***
## z.diff.lag -0.04949    0.08628  -0.574    0.567    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.85 on 135 degrees of freedom
## Multiple R-squared:  0.2926, Adjusted R-squared:  0.2822 
## F-statistic: 27.93 on 2 and 135 DF,  p-value: 7.087e-11
## 
## 
## Value of test-statistic is: -5.9352 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62
  1. We test the unit root hypothesis against the alternative of autoregression with drift and no trend by choosing a fixed number of lags \(l\) or selecting the number of lags by the Akaike information criterium or the Bayes information criterium.
# library(urca)
y <- Gauss_AR1_df$x_r
long_lags <- floor(12*(length(y)/100)^(1/4))
y_ADF_urca_drift_long_lags <- ur.df(y, type="drift", lags=long_lags, selectlags = "Fixed")
summary(y_ADF_urca_drift_long_lags)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.855  -7.475   0.052   7.118  20.236 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   3.4442986  1.4160848   2.432  0.01646 * 
## z.lag.1      -0.6650517  0.2301589  -2.890  0.00457 **
## z.diff.lag1   0.0002099  0.2261584   0.001  0.99926   
## z.diff.lag2  -0.0806804  0.2161372  -0.373  0.70959   
## z.diff.lag3  -0.0506186  0.2073521  -0.244  0.80755   
## z.diff.lag4  -0.0614554  0.1975944  -0.311  0.75632   
## z.diff.lag5   0.0695728  0.1865762   0.373  0.70988   
## z.diff.lag6  -0.1052107  0.1758250  -0.598  0.55069   
## z.diff.lag7   0.0901831  0.1695461   0.532  0.59576   
## z.diff.lag8  -0.1191592  0.1580082  -0.754  0.45222   
## z.diff.lag9  -0.1412313  0.1508274  -0.936  0.35093   
## z.diff.lag10 -0.0764586  0.1390048  -0.550  0.58330   
## z.diff.lag11 -0.0389623  0.1246370  -0.313  0.75511   
## z.diff.lag12 -0.0567667  0.1069328  -0.531  0.59648   
## z.diff.lag13  0.1148476  0.0893265   1.286  0.20098   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.31 on 122 degrees of freedom
## Multiple R-squared:  0.4235, Adjusted R-squared:  0.3573 
## F-statistic: 6.401 on 14 and 122 DF,  p-value: 1.696e-09
## 
## 
## Value of test-statistic is: -2.8895 4.2124 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.46 -2.88 -2.57
## phi1  6.52  4.63  3.81
short_lags <- floor(4*(length(y)/100)^(1/4))
y_ADF_urca_drift_short_lags <- ur.df(y, type="drift", lags=short_lags, selectlags = "Fixed")
summary(y_ADF_urca_drift_short_lags)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -24.524  -7.943   1.033   6.863  26.905 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.97964    1.07012   2.784   0.0061 ** 
## z.lag.1     -0.62572    0.14346  -4.362 2.49e-05 ***
## z.diff.lag1 -0.07265    0.13379  -0.543   0.5880    
## z.diff.lag2 -0.10383    0.11969  -0.867   0.3872    
## z.diff.lag3 -0.08361    0.10296  -0.812   0.4181    
## z.diff.lag4 -0.09574    0.08484  -1.129   0.2610    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.41 on 140 degrees of freedom
## Multiple R-squared:  0.3595, Adjusted R-squared:  0.3367 
## F-statistic: 15.72 on 5 and 140 DF,  p-value: 2.885e-12
## 
## 
## Value of test-statistic is: -4.3616 9.5414 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.46 -2.88 -2.57
## phi1  6.52  4.63  3.81
y_ADF_urca_drift_AIC <- ur.df(y, type="drift", lags=long_lags, selectlags = "AIC")
summary(y_ADF_urca_drift_AIC)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.9785  -8.3785   0.9614   7.1815  25.4846 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.91709    1.03094   3.800 0.000219 ***
## z.lag.1     -0.75857    0.10403  -7.291  2.4e-11 ***
## z.diff.lag   0.05199    0.08651   0.601 0.548878    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.35 on 134 degrees of freedom
## Multiple R-squared:  0.3614, Adjusted R-squared:  0.3519 
## F-statistic: 37.92 on 2 and 134 DF,  p-value: 8.888e-14
## 
## 
## Value of test-statistic is: -7.2914 26.5843 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.46 -2.88 -2.57
## phi1  6.52  4.63  3.81
y_ADF_urca_drift_BIC <- ur.df(y, type="drift", lags=long_lags, selectlags = "BIC")
summary(y_ADF_urca_drift_BIC)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.9785  -8.3785   0.9614   7.1815  25.4846 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.91709    1.03094   3.800 0.000219 ***
## z.lag.1     -0.75857    0.10403  -7.291  2.4e-11 ***
## z.diff.lag   0.05199    0.08651   0.601 0.548878    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.35 on 134 degrees of freedom
## Multiple R-squared:  0.3614, Adjusted R-squared:  0.3519 
## F-statistic: 37.92 on 2 and 134 DF,  p-value: 8.888e-14
## 
## 
## Value of test-statistic is: -7.2914 26.5843 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.46 -2.88 -2.57
## phi1  6.52  4.63  3.81
  1. We test the unit root hypothesis against the alternative of autoregression with drift and linear trend by choosing a fixed number of lags \(l\) or selecting the number of lags by the Akaike information criterium or the Akaike information criterium.
# library(urca)
y <- Gauss_AR1_df$x_r
long_lags <- floor(12*(length(y)/100)^(1/4))
y_ADF_urca_trend_long_lags <- ur.df(y, type="trend", lags=long_lags, selectlags = "Fixed")
summary(y_ADF_urca_trend_long_lags)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.2974  -7.7198   0.5942   7.4005  20.2827 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   1.550509   2.061693   0.752  0.45348   
## z.lag.1      -0.846909   0.271131  -3.124  0.00224 **
## tt            0.033659   0.026689   1.261  0.20968   
## z.diff.lag1   0.172408   0.263714   0.654  0.51450   
## z.diff.lag2   0.084332   0.252211   0.334  0.73868   
## z.diff.lag3   0.107727   0.241976   0.445  0.65697   
## z.diff.lag4   0.085833   0.229119   0.375  0.70860   
## z.diff.lag5   0.206166   0.215346   0.957  0.34029   
## z.diff.lag6   0.021587   0.202174   0.107  0.91515   
## z.diff.lag7   0.208141   0.193277   1.077  0.28366   
## z.diff.lag8  -0.014781   0.178035  -0.083  0.93397   
## z.diff.lag9  -0.049560   0.167102  -0.297  0.76729   
## z.diff.lag10 -0.003485   0.150258  -0.023  0.98153   
## z.diff.lag11  0.016568   0.131903   0.126  0.90025   
## z.diff.lag12 -0.018664   0.110871  -0.168  0.86660   
## z.diff.lag13  0.137924   0.090970   1.516  0.13209   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.28 on 121 degrees of freedom
## Multiple R-squared:  0.431,  Adjusted R-squared:  0.3604 
## F-statistic: 6.109 on 15 and 121 DF,  p-value: 2.245e-09
## 
## 
## Value of test-statistic is: -3.1236 3.352 4.9901 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -3.99 -3.43 -3.13
## phi2  6.22  4.75  4.07
## phi3  8.43  6.49  5.47
short_lags <- floor(4*(length(y)/100)^(1/4))
y_ADF_urca_trend_short_lags <- ur.df(y, type="trend", lags=short_lags, selectlags = "Fixed")
summary(y_ADF_urca_trend_short_lags)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.484  -7.828   1.347   6.879  25.034 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.68930    1.79645   0.384    0.702    
## z.lag.1     -0.70656    0.15156  -4.662 7.27e-06 ***
## tt           0.03417    0.02159   1.583    0.116    
## z.diff.lag1 -0.01190    0.13850  -0.086    0.932    
## z.diff.lag2 -0.05687    0.12270  -0.464    0.644    
## z.diff.lag3 -0.04996    0.10459  -0.478    0.634    
## z.diff.lag4 -0.07650    0.08526  -0.897    0.371    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.35 on 139 degrees of freedom
## Multiple R-squared:  0.3709, Adjusted R-squared:  0.3437 
## F-statistic: 13.66 on 6 and 139 DF,  p-value: 3.743e-12
## 
## 
## Value of test-statistic is: -4.6618 7.2643 10.8663 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -3.99 -3.43 -3.13
## phi2  6.22  4.75  4.07
## phi3  8.43  6.49  5.47
y_ADF_urca_trend_AIC <- ur.df(y, type="trend", lags=long_lags, selectlags = "AIC")
summary(y_ADF_urca_trend_AIC)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.807  -8.095   1.174   6.908  24.005 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.72230    2.05032   0.840    0.402    
## z.lag.1     -0.77675    0.10486  -7.407 1.33e-11 ***
## tt           0.02790    0.02254   1.238    0.218    
## z.diff.lag   0.05942    0.08655   0.687    0.494    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.33 on 133 degrees of freedom
## Multiple R-squared:  0.3687, Adjusted R-squared:  0.3545 
## F-statistic: 25.89 on 3 and 133 DF,  p-value: 2.955e-13
## 
## 
## Value of test-statistic is: -7.4072 18.3037 27.4539 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -3.99 -3.43 -3.13
## phi2  6.22  4.75  4.07
## phi3  8.43  6.49  5.47
y_ADF_urca_trend_BIC <- ur.df(y, type="trend", lags=long_lags, selectlags = "BIC")
summary(y_ADF_urca_trend_BIC)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.807  -8.095   1.174   6.908  24.005 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.72230    2.05032   0.840    0.402    
## z.lag.1     -0.77675    0.10486  -7.407 1.33e-11 ***
## tt           0.02790    0.02254   1.238    0.218    
## z.diff.lag   0.05942    0.08655   0.687    0.494    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.33 on 133 degrees of freedom
## Multiple R-squared:  0.3687, Adjusted R-squared:  0.3545 
## F-statistic: 25.89 on 3 and 133 DF,  p-value: 2.955e-13
## 
## 
## Value of test-statistic is: -7.4072 18.3037 27.4539 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -3.99 -3.43 -3.13
## phi2  6.22  4.75  4.07
## phi3  8.43  6.49  5.47

Note that the formula for long_lags is due to Schwert (1989).

In light of the choice of the number of lags operated by the AIC and BIc, we also apply the ADF test considering \(0\) lags. That is we apply the original Dickey-Fuller test.

# library(urca)
y <- Gauss_AR1_df$x_r
y_ADF_urca_none_0_lags <- ur.df(y, type="none", lags=0, selectlags = "Fixed")
summary(y_ADF_urca_none_0_lags)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.631  -5.357   3.503   9.797  28.821 
## 
## Coefficients:
##         Estimate Std. Error t value Pr(>|t|)    
## z.lag.1 -0.59299    0.07515  -7.891 5.95e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.61 on 149 degrees of freedom
## Multiple R-squared:  0.2947, Adjusted R-squared:   0.29 
## F-statistic: 62.27 on 1 and 149 DF,  p-value: 5.955e-13
## 
## 
## Value of test-statistic is: -7.8913 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62
y_ADF_urca_drift_0_lags <- ur.df(y, type="drift", lags=0, selectlags = "Fixed")
summary(y_ADF_urca_drift_0_lags)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.5229  -8.0928   0.3401   6.8946  25.4383 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.21208    0.90487   3.550 0.000517 ***
## z.lag.1     -0.70110    0.07853  -8.928 1.55e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.22 on 148 degrees of freedom
## Multiple R-squared:  0.3501, Adjusted R-squared:  0.3457 
## F-statistic: 79.71 on 1 and 148 DF,  p-value: 1.554e-15
## 
## 
## Value of test-statistic is: -8.9282 39.8606 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.46 -2.88 -2.57
## phi1  6.52  4.63  3.81
y_ADF_urca_trend_0_lags <- ur.df(y, type="trend", lags=0, selectlags = "Fixed")
summary(y_ADF_urca_trend_0_lags)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -23.1092  -7.5774   0.7977   6.3690  23.1049 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.60657    1.66395   0.365    0.716    
## z.lag.1     -0.73037    0.07946  -9.192 3.43e-16 ***
## tt           0.03624    0.01949   1.860    0.065 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.13 on 147 degrees of freedom
## Multiple R-squared:  0.365,  Adjusted R-squared:  0.3564 
## F-statistic: 42.25 on 2 and 147 DF,  p-value: 3.193e-15
## 
## 
## Value of test-statistic is: -9.192 28.1677 42.247 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -3.99 -3.43 -3.13
## phi2  6.22  4.75  4.07
## phi3  8.43  6.49  5.47

To show how the test operates, we build the following linear model for our time series and compare its summary to the summary of the ADF unit root test against the alternative of autoregression with drift and linear trend with short number of lags \(l=4\).

x <- Gauss_AR1_df$x_r
dx <- diff(x, differences=1)
tt <- Gauss_AR1_df$t[-1]
x_l1 <- x[-length(x)]
x_l2 <- c(NA,x_l1[-length(x_l1)])
x_l3 <- c(NA,x_l2[-length(x_l2)])
x_l4 <- c(NA,x_l3[-length(x_l3)])
dx_l1 <- c(NA,diff(x_l1, differences=1))
dx_l2 <- c(NA,diff(x_l2, differences=1))
dx_l3 <- c(NA,diff(x_l3, differences=1))
dx_l4 <- c(NA,diff(x_l4, differences=1))
ADF_AR1_df <- data.frame(dx, tt, x_l1, x_l2, x_l3, x_l4, dx_l1, dx_l2, dx_l3, dx_l4)
ADF_AR1_lm <- lm(dx ~ 1 + tt + x_l1 + dx_l1 + dx_l2 + dx_l3 + dx_l4, data=ADF_AR1_df)
summary(ADF_AR1_lm)
## 
## Call:
## lm(formula = dx ~ 1 + tt + x_l1 + dx_l1 + dx_l2 + dx_l3 + dx_l4, 
##     data = ADF_AR1_df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.484  -7.828   1.347   6.879  25.034 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.39791    1.12609   2.129    0.035 *  
## tt           3.41724    2.15917   1.583    0.116    
## x_l1        -0.70656    0.15156  -4.662 7.27e-06 ***
## dx_l1       -0.01190    0.13850  -0.086    0.932    
## dx_l2       -0.05687    0.12270  -0.464    0.644    
## dx_l3       -0.04996    0.10459  -0.478    0.634    
## dx_l4       -0.07650    0.08526  -0.897    0.371    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.35 on 139 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.3709, Adjusted R-squared:  0.3437 
## F-statistic: 13.66 on 6 and 139 DF,  p-value: 3.743e-12
summary(y_ADF_urca_trend_short_lags)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.484  -7.828   1.347   6.879  25.034 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.68930    1.79645   0.384    0.702    
## z.lag.1     -0.70656    0.15156  -4.662 7.27e-06 ***
## tt           0.03417    0.02159   1.583    0.116    
## z.diff.lag1 -0.01190    0.13850  -0.086    0.932    
## z.diff.lag2 -0.05687    0.12270  -0.464    0.644    
## z.diff.lag3 -0.04996    0.10459  -0.478    0.634    
## z.diff.lag4 -0.07650    0.08526  -0.897    0.371    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.35 on 139 degrees of freedom
## Multiple R-squared:  0.3709, Adjusted R-squared:  0.3437 
## F-statistic: 13.66 on 6 and 139 DF,  p-value: 3.743e-12
## 
## 
## Value of test-statistic is: -4.6618 7.2643 10.8663 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -3.99 -3.43 -3.13
## phi2  6.22  4.75  4.07
## phi3  8.43  6.49  5.47

The difference in the estimates of the drift and linear trend coefficients are due to the choice of the linear trend variable \(tt\), which in the ADF test is the index of the entries of the vector \(y\). In fact, rewriting our linear model accordingly, the difference disappears.

x <- Gauss_AR1_df$x_r
dx <- diff(x, differences=1)
tt <- 1:(length(x)-1)
x_l1 <- x[-length(x)]
x_l2 <- c(NA,x_l1[-length(x_l1)])
x_l3 <- c(NA,x_l2[-length(x_l2)])
x_l4 <- c(NA,x_l3[-length(x_l3)])
dx_l1 <- c(NA,diff(x_l1, differences=1))
dx_l2 <- c(NA,diff(x_l2, differences=1))
dx_l3 <- c(NA,diff(x_l3, differences=1))
dx_l4 <- c(NA,diff(x_l4, differences=1))
ADF_AR1_mod_df <- data.frame(dx, tt, x_l1, x_l2, x_l3, x_l4, dx_l1, dx_l2, dx_l3, dx_l4)
ADF_AR1_mod_lm <- lm(dx ~ 1 + tt + x_l1 + dx_l1 + dx_l2 + dx_l3 + dx_l4, data=ADF_AR1_mod_df)
summary(ADF_AR1_mod_lm)
## 
## Call:
## lm(formula = dx ~ 1 + tt + x_l1 + dx_l1 + dx_l2 + dx_l3 + dx_l4, 
##     data = ADF_AR1_mod_df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.484  -7.828   1.347   6.879  25.034 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.68930    1.79645   0.384    0.702    
## tt           0.03417    0.02159   1.583    0.116    
## x_l1        -0.70656    0.15156  -4.662 7.27e-06 ***
## dx_l1       -0.01190    0.13850  -0.086    0.932    
## dx_l2       -0.05687    0.12270  -0.464    0.644    
## dx_l3       -0.04996    0.10459  -0.478    0.634    
## dx_l4       -0.07650    0.08526  -0.897    0.371    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.35 on 139 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.3709, Adjusted R-squared:  0.3437 
## F-statistic: 13.66 on 6 and 139 DF,  p-value: 3.743e-12

Note that in all variants of the ADF test considered the null hypotesis is rejected. However, this does not mean that the time series is generated by a stationary process. It means that the process generating the time series cannot contain a random walk component but it may contain a linear trend. This types of processes are also referred to as trend stationary.

Now, we consider the KPSS test which considers the null hypothesis that the time series is generated by an auto-regressive process. In terms of null hypothesis, the test allows to specify that the time series is generated by an autoregressive process with drift, type=“mu”, or the time series is generated by an autoregressive process with a drift and a linear trend, type=“tau”.

The KPSS test contained in the library urca, allow also different possibilities for the number of lags. More specifically, for the type=“mu” we have

y <- Gauss_AR1_df$x_r
y_KPSS_urca_mu_long <- ur.kpss(y, type="mu", lags="long")
summary(y_KPSS_urca_mu_long)
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 13 lags. 
## 
## Value of test-statistic is: 0.4047 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
y_KPSS_urca_mu_short <- ur.kpss(y, type="mu", lags="short")
summary(y_KPSS_urca_mu_short)
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.4731 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
y_KPSS_urca_mu_nil <- ur.kpss(y, type="mu", lags="nil")
summary(y_KPSS_urca_mu_nil)
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 0 lags. 
## 
## Value of test-statistic is: 0.7551 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

For the type=“tau”

y_KPSS_urca_tau_long <- ur.kpss(y, type="tau", lags="long")
summary(y_KPSS_urca_tau_long)
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: tau with 13 lags. 
## 
## Value of test-statistic is: 0.0857 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.119 0.146  0.176 0.216
y_KPSS_urca_tau_short <- ur.kpss(y, type="tau", lags="short")
summary(y_KPSS_urca_tau_short)
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: tau with 4 lags. 
## 
## Value of test-statistic is: 0.0822 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.119 0.146  0.176 0.216
y_KPSS_urca_tau_nil <- ur.kpss(y, type="tau", lags="nil")
summary(y_KPSS_urca_tau_nil)
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: tau with 0 lags. 
## 
## Value of test-statistic is: 0.1202 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.119 0.146  0.176 0.216

Note that in ur.kpss the option lags=“long” choices the lags according to Schwert’s formula (1989). On page \(48\) of urca-pdf notes for the urca package Schwert’s formula is written uncorrectly, though.

In both the types considered we cannot reject the null hypothesis at the significance level of \(5\%\).

Combining the results of the ADF and KPSS test we cannot reject the possibility that the time series is generated by an AR(1) process with drift and linear trend.

We consider a draft plot the residuals of the model of the ADF linear regression with drift, trend. and no lags. That is the model y_ADF_urca_trend_0_lags.

y <- y_ADF_urca_trend_0_lags@res
x <- 1:length(y)
plot(y, xlab="Indices", ylab="Residuals", main="Residuals of the ADF linear regression with drift trend and no lags")
abline(lm(y~x), col="red", lwd=1)

The draft plot shows no evidences for non-stationarity or heteroskedasticity.

Hence, we consider the plot of the autocorrelogram and partial autocorrelogram of the residuals.

y <- y_ADF_urca_trend_0_lags@res
length <- length(y)
maxlag <- ceiling(10*log10(length))
Aut_Fun_y <- acf(y, lag.max = maxlag, type="correlation", plot=FALSE)
ci_90 <- qnorm((1+0.90)/2)/sqrt(length)
ci_95 <- qnorm((1+0.95)/2)/sqrt(length)
ci_99 <- qnorm((1+0.99)/2)/sqrt(length)
Plot_Aut_Fun_y <- data.frame(lag=Aut_Fun_y$lag, acf=Aut_Fun_y$acf)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Plot of Autocorrelogram of the Residuals of the ADF Linear Regression with Drift, Trend. and no Lags"))
subtitle_content <- bquote(paste("path length ", .(length), " sample points,   ", "lags ", .(maxlag)))
caption_content <- "Author: Roberto Monte"
ggplot(Plot_Aut_Fun_y, aes(x=lag, y=acf)) + 
  geom_segment(aes(x=lag, y=rep(0,length(lag)), xend=lag, yend=acf), size = 1, col="black") +
  # geom_col(mapping=NULL, data=NULL, position="dodge", width = 0.1, col="black", inherit.aes = TRUE)+
  geom_hline(aes(yintercept=-ci_90, color="CI_90"), show.legend = TRUE, lty=3) +
  geom_hline(aes(yintercept=ci_90, color="CI_90"), lty=3) +
  geom_hline(aes(yintercept=ci_95, color="CI_95"), show.legend = TRUE, lty=4) + 
  geom_hline(aes(yintercept=-ci_95, color="CI_95"), lty=4) +
  geom_hline(aes(yintercept=-ci_99, color="CI_99"), show.legend = TRUE, lty=4) +
  geom_hline(aes(yintercept=ci_99, color="CI_99"), lty=4) +
  scale_x_continuous(name="lag", breaks=waiver(), label=waiver()) +
  scale_y_continuous(name="acf value", breaks=waiver(), labels=NULL,
                     sec.axis = sec_axis(~., breaks=waiver(), labels=waiver())) +
  scale_color_manual(name="Conf. Inter.", labels=c("90%","95%","99%"),
                     values=c(CI_90="red", CI_95="blue", CI_99="green")) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")

Plot of the partial autocorrelogram.

y <- y_ADF_urca_trend_0_lags@res
length <- length(y)
maxlag <- ceiling(10*log10(length))
P_Aut_Fun_y <- pacf(y, lag.max = maxlag, plot=FALSE)
ci_90 <- qnorm((1+0.90)/2)/sqrt(length)
ci_95 <- qnorm((1+0.95)/2)/sqrt(length)
ci_99 <- qnorm((1+0.99)/2)/sqrt(length)
Plot_P_Aut_Fun_y <- data.frame(lag=P_Aut_Fun_y$lag, pacf=P_Aut_Fun_y$acf)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Plot of Autocorrelogram of the Residuals of the ADF Linear Regression with Drift, Trend. and no Lags"))
subtitle_content <- bquote(paste("path length ", .(length), " sample points,   ", "lags ", .(maxlag)))
caption_content <- "Author: Roberto Monte"
ggplot(Plot_P_Aut_Fun_y, aes(x=lag, y=pacf)) + 
  geom_segment(aes(x=lag, y=rep(0,length(lag)), xend=lag, yend=pacf), size = 1, col="black") +
  # geom_col(mapping=NULL, data=NULL, position="dodge", width = 0.1, col="black", inherit.aes = TRUE)+
  geom_hline(aes(yintercept=-ci_90, color="CI_90"), show.legend = TRUE, lty=3) +
  geom_hline(aes(yintercept=ci_90, color="CI_90"), lty=3) +
  geom_hline(aes(yintercept=ci_95, color="CI_95"), show.legend = TRUE, lty=4) + 
  geom_hline(aes(yintercept=-ci_95, color="CI_95"), lty=4) +
  geom_hline(aes(yintercept=-ci_99, color="CI_99"), show.legend = TRUE, lty=4) +
  geom_hline(aes(yintercept=ci_99, color="CI_99"), lty=4) +
  scale_x_continuous(name="lag", breaks=waiver(), label=waiver()) +
  scale_y_continuous(name="pacf value", breaks=waiver(), labels=NULL,
                     sec.axis = sec_axis(~., breaks=waiver(), labels=waiver())) +
  scale_color_manual(name="Conf. Inter.", labels=c("90%","95%","99%"),
                     values=c(CI_90="red", CI_95="blue", CI_99="green")) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")

The plots do not show clear evidence for autocorrelation.

We apply the Ljung-Box (LB) test

y <- y_ADF_urca_trend_0_lags@res
Box.test(y, lag = 1, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  y
## X-squared = 0.024738, df = 1, p-value = 0.875

In light of the evidences from the autocorrelograms and Ljung-Box test we cannot reject the null of no autocorrelation in the noise component of the the ADF linear regression with drift, trend. and no lags.

We consider the QQ-plot of the residuals of the ADF linear regression with drift, trend. and no lags vs the normal distribution.

y <- y_ADF_urca_trend_0_lags@res
y_vs_Norm_QQ_plot <- qqnorm(y, 
                            main="Residuals of the ADF Linear Regression with Drift, Trend. and no Lags vs Normal Distrib. - QQ-plot",
                            xlab="Theoretical Quantiles", ylab="Residuals",
                            las=1, pch=16, cex=0.5, col="black")
qqline(y, col="black", lwd=1)
abline(a=0, b=1, col="green", lwd=1)
legend("topleft", 
       # legend=c("regression line", "interquatile line", "y = x line"),
       # col=c("red", "black", "green"), 
       legend=c("interquatile line", "y = x line"),
       col=c("black", "green"), 
       lty=1, lwd=0.1,
       cex=0.80, x.intersp=0.50, y.intersp=0.40, text.width=2, seg.len=1,
       inset=-0.01, bty="n")

The QQ-plot does not yield a clear visual evidence for non-normality.

It is worth proceeding with other computational tests.

Jarque-Bera (JB) test.

# library(tseries)
y <- y_ADF_urca_trend_0_lags@res
y_JB <- jarque.bera.test(y)
show(y_JB)
## 
##  Jarque Bera Test
## 
## data:  y
## X-squared = 2.6121, df = 2, p-value = 0.2709

Shapiro-Wilks (SW) test.

# library(stats)
y <- y_ADF_urca_trend_0_lags@res
y_SW <- shapiro.test(y)
show(y_SW)
## 
##  Shapiro-Wilk normality test
## 
## data:  y
## W = 0.98892, p-value = 0.2825

D’Agostino Pearson (DP) test.

# library(fBasics)
y <- y_ADF_urca_trend_0_lags@res
y_DP <- dagoTest(y)
show(y_DP)
## 
## Title:
##  D'Agostino Normality Test
## 
## Test Results:
##   STATISTIC:
##     Chi2 | Omnibus: 4.6789
##     Z3  | Skewness: -0.0746
##     Z4  | Kurtosis: -2.1618
##   P VALUE:
##     Omnibus  Test: 0.09638 
##     Skewness Test: 0.9406 
##     Kurtosis Test: 0.03063

As a result of the visual evidence and the normality tests we cannot reject the normality hypotesis for the residuals of the the ADF linear regression with drift, trend. and no lags.

In light of the collected evidences, we are led to build the following linear model for the time series.

x <- Gauss_AR1_df$x_r
tt <- Gauss_AR1_df$t
x_l1 <- c(NA,x[-length(x)])
AR1_df <- data.frame(x, tt, x_l1)
AR1_lm <- lm(x ~ 1 + tt + x_l1, data=AR1_df)
summary(AR1_lm)
## 
## Call:
## lm(formula = x ~ 1 + tt + x_l1, data = AR1_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -23.1092  -7.5774   0.7977   6.3690  23.1049 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.41870    0.99371   2.434 0.016132 *  
## tt           3.62426    1.94903   1.860 0.064953 .  
## x_l1         0.26963    0.07946   3.393 0.000887 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.13 on 147 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.1101, Adjusted R-squared:  0.09799 
## F-statistic: 9.093 on 2 and 147 DF,  p-value: 0.0001891

12.2 Autoregressive Processes of Order \(p\) - AR(p) Processes

Let \(\left(X_{t}\right)_{t\in\mathbb{N}_{0}}\equiv\mathbf{X}\) be a stochastic process on a probability space \(\Omega\) with states in \(\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\). We assume that the random variable \(X_{0}\) has finite moment of order \(2\) and set \(\mathbf{E}\left[X_{0}\right]\equiv\mu_{X_{0}}\) and \(Var\left(X_{0}\right)\equiv\Sigma_{X_{0}}^{2}\).

Definition 12.7 (AR(p) Processes) We say that \(\mathbf{X}\) is an autoregressive process of order \(p\), for some \(p\in\mathbb{N}\), if there exist \(\Phi_{1},\dots,\Phi_{p}\in\mathbb{R}^{N}\times\mathbb{R}^{N}\) and a strong white noise \(\left(W_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{W}\) on \(\Omega\) with states in \(\mathbb{R}^{N}\) and variance-covariance matrix \(\Sigma_{\mathbf{W}}^{2}\), for some definite positive symmetric matrix \(\Sigma_{\mathbf{W}}^{2}\), such that \(X_{0}\) is independent of \(\mathbf{W}\) and the random variables in \(\mathbf{X}\) satisfy the equations \[\begin{equation} \begin{array} [c]{l} X_{1}=\Phi_{1}X_{0}+W_{1},\\ X_{2}=\Phi_{1}X_{1}+\Phi_{2}X_{0}+W_{2},\\ X_{3}=\Phi_{1}X_{2}+\Phi_{2}X_{1}+\Phi_{3}X_{0}+W_{3},\\ \cdots\\ X_{p-1}=\Phi_{1}X_{p-2}+\Phi_{2}X_{p-3}+\cdots+\Phi_{p-2}X_{1}+\Phi_{p-1}X_{0}+W_{p-1}\\ X_{p}=\Phi_{1}X_{p-1}+\Phi_{2}X_{p-2}+\cdots+\Phi_{p}X_{0}+W_{p}\\ X_{t}=\Phi_{1}X_{t-1}+\Phi_{2}X_{t-2}+\cdots+\Phi_{p}X_{t-p}+W_{t},\quad\forall t\geq p, \end{array} \tag{12.66} \end{equation}\]

More generally,

Definition 12.8 (AR(p) processes with drift and linear trend) We say that \(\mathbf{X}\) is an \(N\)-variate real autoregressive process of order \(p\), for some \(p\in\mathbb{N}\), with drift and linear trend if there exist \(\alpha, \beta\in\mathbb{R}^{N}\), \(\Phi_{1},\dots,\Phi_{p}\in\mathbb{R}^{N}\times\mathbb{R}^{N}\), and a strong white noise \(\left(W_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{W}\) on \(\Omega\) with states in \(\mathbb{R}^{N}\) and variance-covariance matrix \(\Sigma_{\mathbf{W}}^{2}\), for some definite positive symmetric matrix \(\Sigma_{\mathbf{W}}^{2}\), such that \(X_{0}\) is independent of \(\mathbf{W}\) and we have \[\begin{equation} \begin{array} [c]{l} X_{1}=\alpha+\beta+\Phi_{1}X_{0}+W_{1},\\ X_{2}=\alpha+2\beta+\Phi_{1}X_{1}+\Phi_{2}X_{0}+W_{2},\\ X_{3}=\alpha+3\beta+\Phi_{1}X_{2}+\Phi_{2}X_{1}+\Phi_{3}X_{0}+W_{3},\\ \cdots\\ X_{p-1}=\alpha+\left(p-1\right)\beta+\Phi_{1}X_{p-2}+\Phi_{2}X_{p-3}+\cdots+\Phi_{p-2}X_{1}+\Phi_{p-1}X_{0}+W_{p-1}\\ X_{p}=\alpha+\beta p+\Phi_{1}X_{p-1}+\Phi_{2}X_{p-2}+\cdots+\Phi_{p}X_{0}+W_{p}\\ X_{t}=\alpha+\beta t+\Phi_{1}X_{t-1}+\Phi_{2}X_{t-2}+\cdots+\Phi_{p}X_{t-p}+W_{t},\quad\forall t\geq p, \end{array} \tag{12.67} \end{equation}\] The random vector \(X_{0}\) [resp. the distribution of the random vector \(X_{0}\)] is referred to as the initial state [resp. the initial distribution] of the autoregressive process \(\mathbf{X}\); in case \(X_{0}\equiv x_{0}\in\mathbb{R}^{N}\), we also call \(x_{0}\) the starting point of \(\mathbf{X}\); the vector \(\alpha\) [resp. \(\beta\)] is referred to as the drift, [resp. linear trend coefficient] of \(\mathbf{X}\); when we want to stress that \(\alpha\neq0\) and \(\beta=0\) [resp. \(\alpha=0\) and \(\beta\neq0\)] we call \(\mathbf{X}\) an autoregressive process with drift and no linear trend [resp. with linear trend and no drift]; the matrices \(\Phi_{1},\dots,\Phi_{p}\in\mathbb{R}^{N}\times\mathbb{R}^{N}\) are referred to as the regression coefficients of \(\mathbf{X}\); the strong white noise \(\mathbf{W}\) is referred to as the state innovation of the autoregressive process \(\mathbf{X}\). The explicit reference to the state innovation \(\mathbf{W}\) is often neglected when not necessary, though.

To denote that \(\mathbf{X}\) is an \(N\)-variate real autoregressive process of orders \(p\) we write \(\mathbf{X}\sim AR^{N}(p)\). In case \(N=1\), we usually speak of real autoregressive average process of orders \(p\), neglecting to mention \(N\) and writing \(\mathbf{X}\sim AR(p)\). We also write \(\phi_{1},\dots,\phi_{p}\) for the autoregressive coefficients rather than \(\Phi_{1},\dots,\Phi_{p}\), and write \(\sigma_{\mathbf{W}}^{2}\) for the variance of the innovation process rather than \(\Sigma_{\mathbf{W}}^{2}\).

In several circumstances, it is considered a process \(\left(X_{t}\right)_{t\in\mathbb{T}_{p}}\equiv\mathbf{X}\) with time set \(\mathbb{T}_{p}\equiv\left\{1-p,2-p,\dots,-1,\right\}\cup\mathbb{N}_{0}\), for some \(p\in\mathbb{N}\), where the negative indices \(1-p,2-p,\dots,-1\in\mathbb{T}_{p}\) are intended to represent past times with respect to the current time \(0\). In this case we say that \(\mathbf{X}\) is an \(N\)-variate real autoregressive process of order \(p\) with drift and linear trend if there exist \(\alpha,\beta,\Phi_{1},\dots,\Phi_{p}\), and \(\mathbf{W}\) as in Definition 12.8 such that the random variables in \(\mathbf{X}\) satisfy the equation \[\begin{equation} X_{t}=\alpha+\beta t+\Phi_{1}X_{t-1}+\Phi_{2}X_{t-2}+\cdots+\Phi_{p-1}X_{t-(p-1)}+\Phi_{p}X_{t-p}+W_{t}, \tag{12.68} \end{equation}\] for every \(t\in\mathbb{N}\). In this case, the random variables \(X_{1-p},X_{2-p},\dots,X_{-1}\) are called past states of the process and the random variables of the state innovation \(\mathbf{W}\) are assumed to be independent of all \(X_{1-p},X_{2-p},\dots,X_{-1}\), and \(X_{0}\). Typically, relying on the argument that the realizations of the past states of a stochastic process have been observed, the random variables \(X_{1-p},X_{2-p},\dots,X_{-1},X_{0}\) are assumed to be Dirac random variables concentrated at some points \(x_{1-p},x_{2-p},\dots,x_{-1},x_{0}\in\mathbb{R}^{N}\).

In other circumstances, it is more appropriate to consider a process \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{X}\) with time set \(\mathbb{T}\equiv\mathbb{Z}\). In this case we say that \(\mathbf{X}\) is a \(p\)th-order \(N\)-variate autoregressive process with drift and linear trend if there exist \(\alpha,\beta,\phi_{1},\dots,\phi_{p}\), as in Definition 12.8 and a strong white noise \(\left(W_{t}\right)_{t\in\mathbb{Z}}\equiv\mathbf{W}\) such that the random variables in \(\mathbf{X}\) \[\begin{equation} X_{t}=\alpha+\beta t+\phi_{1}X_{t-1}+\Phi_{2}X_{t-2}+\cdots+\Phi_{p-1}X_{t-(p-1)}+\Phi_{p}X_{t-p}+W_{t}, \tag{12.69} \end{equation}\] for every \(t\in\mathbb{Z}\). In this case, past, current and future states of the process, corresponding to negative, zero and positive time indices, respectively, are all intended to be random variables.

In what follows, we restrain our attention to \(AR(p)\) processes \(\mathbf{X}\), for which \(\alpha\), \(\beta\), and \(\phi_{1},\dots,\phi_{p}\) are all real numbers and \(\mathbf{W}\) is a real strong white noise with variance \(\sigma_{\mathbf{W}}^{2}\), for some \(\sigma_{\mathbf{W}}>0\).

Proposition 12.21 (Casual AR(p) processes) Under the assumption \(\alpha=\beta=0\), an \(AR(p)\) processes \(\mathbf{X}\) is casual if and only the roots of the polynomial \[\begin{equation} 1-\phi_{1}z-\phi_{2}z^{2}-\cdots-\phi_{p-1}z^{p-1}-\phi_{p}z^{p} \end{equation}\] are outside the unit circle in the complex plane \(\mathbb{C}\).

Proposition 12.22 (Weakly stationary and ergodic AR(p) processes) Under the assumption \(\alpha=\beta=0\), an \(AR(p)\) processes \(\mathbf{X}\) is weakly stationary and ergodic if and only if \(AR(p)\) is casual.

12.2.1 Examples

We build an \(AR(3)\) process and consider some sample paths

t <- seq(from=-0.49, to=1.00, length.out=150)     # Choosing the time set.
a <- 0.5                                          # Choosing the drift coefficient. 
b <- 5.0                                          # Choosing the linear trend coefficient.
z_1 <- 1.5                                        # choosing the root of the polynomial
z_2 <- 1.5i                                       # 1 -f_1*z - f_2*z^2 - f_3*z^3
z_3 <- -1.5i                                      # outside the unit circle of the complex plane
f_1 <- 0.66667                                    # Determine accordingly the regression coefficients.
f_2 <- - 0.44444
f_3 <- 0.2963

set.seed(12345, kind=NULL, normal.kind=NULL)      # Setting a random seed for reproducibility.

# Determining one path of the possible values of the Gaussian random variables in the state innovation process.
Gauss_r <- rnorm(n=150, mean=0, sd=9)              

# Setting an empty vector of length 150 to store the sample path of the AR(1) process, corresponding 
# to the sample path of the state innovation.
x_r <- rep(NA,150)                               

x0 <- 0                                           # Choosing the starting point of the AR(1) process.

# Determining the first point (after the starting point) of the sample path of the AR(3) process.
x_r[1] <- a + b*t[1] + f_1*x0 + Gauss_r[1]        

# Determining the second point of the sample path of the AR(3) process.
x_r[2] <- a + b*t[2] + f_1*x_r[1] + f_2*x0 + Gauss_r[2]

# Determining the third point of the sample path of the AR(3) process.
x_r[3] <- a + b*t[3] + f_1*x_r[2] + f_2*x_r[1] +  f_3*x0  + Gauss_r[3]

# Determining the other points of the sample path of the AR(3) process.
for (n in 4:150)
{x_r[n] <- a + b*t[n] + f_1*x_r[n-1] + f_2*x_r[n-2] + f_3*x_r[n-3]+ Gauss_r[n]} 

head(x_r)                                         # Initial part of the sample path of the AR(1) process.
## [1]   3.3197594   6.6983781   0.1564441  -7.7705605   0.4377870 -14.2698420
tail(x_r)                                         # Final part of the sample path of the AR(1) process.
## [1] -1.169399 12.071147 -1.970565  6.021766 21.981128 18.142186
# Setting another random seed for reproducibility and building another sample path of the AR(1) process.
set.seed(23451, kind=NULL, normal.kind=NULL)      

# Building another sample path of the Gaussian state innovation process, which retains the first 50 
# sample points of the former path.
Gauss_g <- replace(Gauss_r, c(51:150), rnorm(n=100, mean=0, sd=9))

# Setting an empty vector of length 150 to store the sample path of the AR(1) process, corresponding 
# to the sample path of the state innovation.
x_g <- rep(NA,150)                                 
                                                   
x0 <- 0                                           # Choosing the starting point of the AR(1) process.

# Determining the first point (after the starting point) of the sample path of the AR(3) process.
x_g[1] <- a + b*t[1] + f_1*x0 + Gauss_g[1]        
                                                  
# Determining the second point of the sample path of the AR(3) process.
x_g[2] <- a + b*t[2] + f_1*x_g[1] + f_2*x0 + Gauss_g[2]

# Determining the third point of the sample path of the AR(3) process.
x_g[3] <- a + b*t[3] + f_1*x_g[2] + f_2*x_g[1] +  f_3*x0  + Gauss_g[3]

# Determining the other points of the sample path of the AR(3) process.
for (n in 4:150)
{x_g[n] <- a + b*t[n] + f_1*x_g[n-1] + f_2*x_g[n-2] + f_3*x_g[n-3]+ Gauss_g[n]} 

head(x_g)                                         # Initial part of the sample path of the AR(1) process.
## [1]   3.3197594   6.6983781   0.1564441  -7.7705605   0.4377870 -14.2698420
tail(x_g)                                         # Final part of the sample path of the AR(1) process.
## [1] -0.1059164  9.2898520  7.8326888  8.3469073 14.3732713 12.0657228
set.seed(34512, kind=NULL, normal.kind=NULL)
Gauss_b <- replace(Gauss_r, c(51:150), rnorm(n=100, mean=0, sd=9))

x_b <- rep(NA,150)                               
x0 <- 0                                           
x_b[1] <- a + b*t[1] + f_1*x0 + Gauss_b[1]        
x_b[2] <- a + b*t[2] + f_1*x_b[1] + f_2*x0 + Gauss_b[2]
x_b[3] <- a + b*t[3] + f_1*x_b[2] + f_2*x_b[1] + f_3*x0  + Gauss_b[3]

for (n in 4:150)
{x_b[n] <- a + b*t[n] + f_1*x_b[n-1] + f_2*x_b[n-2] + f_3*x_b[n-3]+ Gauss_b[n]} 

head(x_b)                                         
## [1]   3.3197594   6.6983781   0.1564441  -7.7705605   0.4377870 -14.2698420
tail(x_b)                                        
## [1]  4.941584 11.207498 20.759704 15.660101  7.439303  2.053446
# Generating a data frame from the time variable and the three paths of the AR(1) process.
Gauss_AR3_df <- data.frame(t,x_r,x_b,x_g)         
 
head(Gauss_AR3_df)
##       t         x_r         x_b         x_g
## 1 -0.49   3.3197594   3.3197594   3.3197594
## 2 -0.48   6.6983781   6.6983781   6.6983781
## 3 -0.47   0.1564441   0.1564441   0.1564441
## 4 -0.46  -7.7705605  -7.7705605  -7.7705605
## 5 -0.45   0.4377870   0.4377870   0.4377870
## 6 -0.44 -14.2698420 -14.2698420 -14.2698420
tail(Gauss_AR3_df)
##        t       x_r       x_b        x_g
## 145 0.95 -1.169399  4.941584 -0.1059164
## 146 0.96 12.071147 11.207498  9.2898520
## 147 0.97 -1.970565 20.759704  7.8326888
## 148 0.98  6.021766 15.660101  8.3469073
## 149 0.99 21.981128  7.439303 14.3732713
## 150 1.00 18.142186  2.053446 12.0657228
# library(dplyr)
# Adding a row to represent  the starting point of the AR(1) process.
Gauss_AR3_df <- add_row(Gauss_AR3_df,  t=-0.50, x_r=0, x_b=0, x_g=0, .before=1) 
head(Gauss_AR3_df)
##       t        x_r        x_b        x_g
## 1 -0.50  0.0000000  0.0000000  0.0000000
## 2 -0.49  3.3197594  3.3197594  3.3197594
## 3 -0.48  6.6983781  6.6983781  6.6983781
## 4 -0.47  0.1564441  0.1564441  0.1564441
## 5 -0.46 -7.7705605 -7.7705605 -7.7705605
## 6 -0.45  0.4377870  0.4377870  0.4377870
tail(Gauss_AR3_df)
##        t       x_r       x_b        x_g
## 146 0.95 -1.169399  4.941584 -0.1059164
## 147 0.96 12.071147 11.207498  9.2898520
## 148 0.97 -1.970565 20.759704  7.8326888
## 149 0.98  6.021766 15.660101  8.3469073
## 150 0.99 21.981128  7.439303 14.3732713
## 151 1.00 18.142186  2.053446 12.0657228

We plot the paths of the \(AR(3)\) process. First, the scatter Plot

# library(ggplot2)
Data_df <- Gauss_AR3_df
First_Date <- as.character(Data_df$t[1])
Last_Date <- as.character(Data_df$t[nrow(Data_df)])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Scatter Plot of Three Paths of a Gaussian AR(1) Process with Drift and Linear Trend for t = ", .(First_Date), " to t = ", .(Last_Date)))
subtitle_content <- bquote(atop(paste("path length ", .(nrow(Data_df)), " sample points,    starting point ",
                                      x[0]==0, ",    drift par. ", alpha==.(a), ",  linear trend par. ", 
                                      beta==.(b), ",  regression par. ", phi[1]==.(f_1),", ", phi[2]==.(f_2),", ", phi[3]==.(f_3),","),
                                paste("state innovation random seeds ", 12345, ", " , 23451, ", " , 34512, 
                                      ",    state innovation var. par. ", sigma^2==1,".")))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
y_name <- bquote(~ X[t] ~ "values")
x_breaks_num <- 15
x_binwidth <- round((max(Data_df$t)-min(Data_df$t))/x_breaks_num, digits=3)
x_breaks_low <- floor((min(Data_df$t)/x_binwidth))*x_binwidth
x_breaks_up <- ceiling((max(Data_df$t)/x_binwidth))*x_binwidth
x_breaks <- round(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth),3)
x_labs <- format(x_breaks, scientific=FALSE)
j <- 0
x_lims <- c(x_breaks_low-j*x_binwidth, x_breaks_up+j*x_binwidth)
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$x_r,Data_df$x_b,Data_df$x_g)-min(Data_df$x_r,Data_df$x_b,Data_df$x_g))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$x_r,Data_df$x_b,Data_df$x_g)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$x_r,Data_df$x_b,Data_df$x_g)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
x_k_col <- bquote("random seed" ~  12345)
x_r_col <- bquote("random seed" ~  12345)
x_b_col <- bquote("random seed" ~  23451)
x_g_col <- bquote("random seed" ~  34512)
leg_labs <- c(x_k_col, x_r_col, x_b_col, x_g_col)
leg_ord <- c("x_k_col", "x_r_col", "x_b_col", "x_g_col")
leg_cols <- c("x_k_col"="black", "x_r_col"="red", "x_b_col"="blue", "x_g_col"="green")
AR3_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_hline(yintercept = 0, size=0.3, colour="black") +
  geom_vline(xintercept = 0, size=0.3, colour="black") +
  geom_point(alpha=1, size=1, aes(y=x_r, color="x_r_col")) +
  geom_point(alpha=1, size=1, aes(y=x_b, color="x_b_col")) +
  geom_point(alpha=1, size=1, aes(y=x_g, color="x_g_col")) +
  geom_point(data=subset(Data_df, Data_df$t <= 0), alpha=1, size=1, aes(y=x_r, color="x_k_col")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_ord) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(AR3_sp)

Second, the line plot

# library(ggplot2)
Data_df <- Gauss_AR3_df
First_Date <- as.character(Data_df$t[1])
Last_Date <- as.character(Data_df$t[nrow(Data_df)])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Line Plot of Three Paths of a Gaussian AR(3) process with Drift and Linear Trend for t = ", .(First_Date), " to t = ", .(Last_Date)))
subtitle_content <- bquote(atop(paste("path length ", .(nrow(Data_df)), " sample points,    starting point ",
                                      x[0]==0, ",    drift par. ", alpha==.(a), ",  linear trend par. ", 
                                      beta==.(b), ",  regression par. ", phi[1]==.(f_1),", ", phi[2]==.(f_2),", ", phi[3]==.(f_3),","),
                                paste("state innovation random seeds ", 12345, ", " , 23451, ", " , 34512, 
                                      ",    state innovation var. par. ", sigma^2==1,".")))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
y_name <- bquote(~ X[t] ~ "values")
x_breaks_num <- 15
x_binwidth <- round((max(Data_df$t)-min(Data_df$t))/x_breaks_num, digits=3)
x_breaks_low <- floor((min(Data_df$t)/x_binwidth))*x_binwidth
x_breaks_up <- ceiling((max(Data_df$t)/x_binwidth))*x_binwidth
x_breaks <- round(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth),3)
x_labs <- format(x_breaks, scientific=FALSE)
j <- 0
x_lims <- c(x_breaks_low-j*x_binwidth, x_breaks_up+j*x_binwidth)
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$x_r,Data_df$x_b,Data_df$x_g)-min(Data_df$x_r,Data_df$x_b,Data_df$x_g))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$x_r,Data_df$x_b,Data_df$x_g)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$x_r,Data_df$x_b,Data_df$x_g)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <-  0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
x_k_col <- bquote("random seed" ~  12345)
x_r_col <- bquote("random seed" ~  12345)
x_b_col <- bquote("random seed" ~  23451)
x_g_col <- bquote("random seed" ~  34512)
leg_labs <- c(x_k_col, x_r_col, x_b_col, x_g_col)
leg_ord <- c("x_k_col", "x_r_col", "x_b_col", "x_g_col")
leg_cols <- c("x_k_col"="black", "x_r_col"="red", "x_b_col"="blue", "x_g_col"="green")
AR3_lp <- ggplot(Data_df, aes(x=t)) + 
  geom_hline(yintercept = 0, size=0.3, colour="black") +
  geom_vline(xintercept = 0, size=0.3, colour="black") +
  geom_line(alpha=1, size=0.6, aes(y=x_b, color="x_b_col"), group=1) +
  geom_line(alpha=1, size=0.6, aes(y=x_g, color="x_g_col"), group=1) +
  geom_line(alpha=1, size=0.6, aes(y=x_r, color="x_r_col"), group=1) +
  geom_line(data=subset(Data_df, Data_df$t <= 0), alpha=1, size=1, aes(y=x_r, color="x_k_col")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_ord) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(AR3_lp)

From the visual inspection of both the scatter and line plot, the three paths of the AR(3) process show a certain evidence for a trend, but no evidence for seasonality. Moreover, there is no visual evidence of heteroskedasticity.

We concentrate on the analysis of the black-red path, characterized by random seed 12345.

The scatter plot

# library(ggplot2)
Data_df <- Gauss_AR3_df
First_Date <- as.character(Data_df$t[1])
Last_Date <- as.character(Data_df$t[nrow(Data_df)])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Scatter Plot of the Black-Red Path of a Gaussian AR(1) Process with Drift and Linear Trend for t = ", .(First_Date), " to t = ", .(Last_Date)))
subtitle_content <- bquote(atop(paste("path length ", .(nrow(Data_df)), " sample points,    starting point ",
                                      x[0]==0, ",    drift par. ", alpha==.(a), ",  linear trend par. ", 
                                      beta==.(b), ",  regression par. ", phi[1]==.(f_1),", ", phi[2]==.(f_2),", ", phi[3]==.(f_3),","),
                                paste("state innovation random seeds ", 12345, ", " , 23451, ", " , 34512, 
                                      ",    state innovation var. par. ", sigma^2==1,".")))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
y_name <- bquote(~ X[t] ~ "values")
x_breaks_num <- 15
x_binwidth <- round((max(Data_df$t)-min(Data_df$t))/x_breaks_num, digits=3)
x_breaks_low <- floor((min(Data_df$t)/x_binwidth))*x_binwidth
x_breaks_up <- ceiling((max(Data_df$t)/x_binwidth))*x_binwidth
x_breaks <- round(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth),3)
x_labs <- format(x_breaks, scientific=FALSE)
j <- 0
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$x_r)-min(Data_df$x_r))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$x_r)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$x_r)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <-  0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
x_k_col <- bquote("random seed" ~  12345)
x_r_col <- bquote("random seed" ~  12345)
x_rgrln <- bquote("Regression Line")
x_loess <- bquote("LOESS Curve")
leg_labs <- c(x_k_col, x_r_col, x_rgrln, x_loess)
leg_ord  <- c("x_k_col", "x_r_col", "x_rgrln", "x_loess")
leg_cols <- c("x_k_col"="black", "x_r_col"="red", "x_rgrln"="blue", "x_loess"="green")
AR3_r_sp <- ggplot(Data_df, aes(x=t)) + 
  geom_hline(yintercept = 0, size=0.3, colour="black") +
  geom_vline(xintercept = 0, size=0.3, colour="black") +
  geom_smooth(alpha=1, size = 0.5, linetype="solid", aes(x=t, y=x_r, color="x_rgrln"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=x_r, color="x_loess"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_point(alpha=1, size=1, aes(y=x_r, color="x_r_col")) +
  geom_point(data=subset(Data_df, Data_df$t <= 0), alpha=1, size=1, aes(y=x_r, color="x_k_col")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
 scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_ord,
                     guide=guide_legend(override.aes=list(shape=c(NA,NA,NA,NA),
                     linetype=c("dotted", "dotted", "solid", "dashed")))) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(AR3_r_sp)

The line plot

# library(ggplot2)
Data_df <- Gauss_AR3_df
First_Date <- as.character(Data_df$t[1])
Last_Date <- as.character(Data_df$t[nrow(Data_df)])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Line Plot of the Black-Red Path of a Gaussian AR(1) Process with Drift and Linear Trend for t = ", .(First_Date), " to t = ", .(Last_Date)))
subtitle_content <- bquote(atop(paste("path length ", .(nrow(Data_df)), " sample points,    starting point ",
                                      x[0]==0, ",    drift par. ", alpha==.(a), ",  linear trend par. ", 
                                      beta==.(b), ",  regression par. ", phi[1]==.(f_1),", ", phi[2]==.(f_2),", ", phi[3]==.(f_3),","),
                                paste("state innovation random seeds ", 12345, ", " , 23451, ", " , 34512, 
                                      ",    state innovation var. par. ", sigma^2==1,".")))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
y_name <- bquote(~ X[t] ~ "values")
x_breaks_num <- 15
x_binwidth <- round((max(Data_df$t)-min(Data_df$t))/x_breaks_num, digits=3)
x_breaks_low <- floor((min(Data_df$t)/x_binwidth))*x_binwidth
x_breaks_up <- ceiling((max(Data_df$t)/x_binwidth))*x_binwidth
x_breaks <- round(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth),3)
x_labs <- format(x_breaks, scientific=FALSE)
j <- 0
x_lims <- c(x_breaks_low-j*x_binwidth, x_breaks_up+j*x_binwidth)
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$x_r)-min(Data_df$x_r))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$x_r)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$x_r)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <-  0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
x_rgrln <- bquote("Regression Line")
x_loess <- bquote("LOESS Curve")
leg_labs <- c(x_k_col, x_r_col, x_rgrln, x_loess)
leg_ord  <- c("x_k_col", "x_r_col", "x_rgrln", "x_loess")
leg_cols <- c("x_k_col"="black", "x_r_col"="red", "x_rgrln"="blue", "x_loess"="green")
AR3_r_lp <- ggplot(Data_df, aes(x=t)) + 
  geom_hline(yintercept = 0, size=0.3, colour="black") +
  geom_vline(xintercept = 0, size=0.3, colour="black") +
  geom_smooth(alpha=1, size = 0.5, linetype="solid", aes(x=t, y=x_r, color="x_rgrln"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=x_r, color="x_loess"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_line(alpha=1, size=0.6, aes(y=x_r, color="x_r_col"), group=1) +
  geom_line(data=subset(Data_df, Data_df$t <= 0), alpha=1, size=1, aes(y=x_r, color="x_k_col")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  #  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_ord) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_ord,
                     guide=guide_legend(override.aes=list(linetype=c("solid", "solid", "solid", "dashed")))) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(AR3_r_lp)

Note that the shape of loess line suggests a mean reversion of the data around the regression line. In terms of a visual inspection for evidences of trend and seasonality, it is also useful to consider the corellograms of the time series.

Plot of the autocorrelogram.

y <- Gauss_AR3_df$x_r
length <- length(y)
maxlag <- ceiling(10*log10(length))
Aut_Fun_y <- acf(y, lag.max = maxlag, type="correlation", plot=FALSE)
ci_90 <- qnorm((1+0.90)/2)/sqrt(length)
ci_95 <- qnorm((1+0.95)/2)/sqrt(length)
ci_99 <- qnorm((1+0.99)/2)/sqrt(length)
Plot_Aut_Fun_y <- data.frame(lag=Aut_Fun_y$lag, acf=Aut_Fun_y$acf)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Plot of the Autocorrelogram of the Black-Red Path of a Gaussian AR(3) Process with Drift and Linear Trend for t = ", .(First_Date), " to t = ", .(Last_Date)))
subtitle_content <- bquote(paste("path length ", .(length), " sample points,   ", "lags ", .(maxlag)))
caption_content <- "Author: Roberto Monte"
ggplot(Plot_Aut_Fun_y, aes(x=lag, y=acf)) + 
  geom_segment(aes(x=lag, y=rep(0,length(lag)), xend=lag, yend=acf), size = 1, col="black") +
  # geom_col(mapping=NULL, data=NULL, position="dodge", width = 0.1, col="black", inherit.aes = TRUE)+
  geom_hline(aes(yintercept=-ci_90, color="CI_90"), show.legend = TRUE, lty=3) +
  geom_hline(aes(yintercept=ci_90, color="CI_90"), lty=3) +
  geom_hline(aes(yintercept=ci_95, color="CI_95"), show.legend = TRUE, lty=4) + 
  geom_hline(aes(yintercept=-ci_95, color="CI_95"), lty=4) +
  geom_hline(aes(yintercept=-ci_99, color="CI_99"), show.legend = TRUE, lty=4) +
  geom_hline(aes(yintercept=ci_99, color="CI_99"), lty=4) +
  scale_x_continuous(name="lag", breaks=waiver(), label=waiver()) +
  scale_y_continuous(name="acf value", breaks=waiver(), labels=NULL,
                     sec.axis = sec_axis(~., breaks=waiver(), labels=waiver())) +
  scale_color_manual(name="Conf. Inter.", labels=c("90%","95%","99%"),
                     values=c(CI_90="red", CI_95="blue", CI_99="green")) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")

Plot of the partial autocorrelogram.

y <- Gauss_AR3_df$x_r
length <- length(y)
maxlag <- ceiling(10*log10(length))
P_Aut_Fun_y <- pacf(y, lag.max = maxlag, plot=FALSE)
ci_90 <- qnorm((1+0.90)/2)/sqrt(length)
ci_95 <- qnorm((1+0.95)/2)/sqrt(length)
ci_99 <- qnorm((1+0.99)/2)/sqrt(length)
Plot_P_Aut_Fun_y <- data.frame(lag=P_Aut_Fun_y$lag, pacf=P_Aut_Fun_y$acf)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Plot of the Partial Autocorrelogram of the Black-Red Path of a Gaussian AR(3) Process with Drift and Linear Trend for t = ", .(First_Date), " to t = ", .(Last_Date)))
subtitle_content <- bquote(paste("path length ", .(length), " sample points,   ", "lags ", .(maxlag)))
caption_content <- "Author: Roberto Monte"
ggplot(Plot_P_Aut_Fun_y, aes(x=lag, y=pacf)) + 
  geom_segment(aes(x=lag, y=rep(0,length(lag)), xend=lag, yend=pacf), size = 1, col="black") +
  # geom_col(mapping=NULL, data=NULL, position="dodge", width = 0.1, col="black", inherit.aes = TRUE)+
  geom_hline(aes(yintercept=-ci_90, color="CI_90"), show.legend = TRUE, lty=3) +
  geom_hline(aes(yintercept=ci_90, color="CI_90"), lty=3) +
  geom_hline(aes(yintercept=ci_95, color="CI_95"), show.legend = TRUE, lty=4) + 
  geom_hline(aes(yintercept=-ci_95, color="CI_95"), lty=4) +
  geom_hline(aes(yintercept=-ci_99, color="CI_99"), show.legend = TRUE, lty=4) +
  geom_hline(aes(yintercept=ci_99, color="CI_99"), lty=4) +
  scale_x_continuous(name="lag", breaks=waiver(), label=waiver()) +
  scale_y_continuous(name="pacf value", breaks=waiver(), labels=NULL,
                     sec.axis = sec_axis(~., breaks=waiver(), labels=waiver())) +
  scale_color_manual(name="Conf. Inter.", labels=c("90%","95%","99%"),
                     values=c(CI_90="red", CI_95="blue", CI_99="green")) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")

From the plots we have no visual evidences for seasonality and no visual evidence of a strong trend component. In addition, from the partial autocorrelograms we have a certain visual evidence for autocorrelation of AR(3)-Ar(4) type.

In light of the visual evidences from the scatter and line plots and from the autocorrelograms, we have an overall evidence that the process generating the time series might be an AR(3) or AR(4) process with drift and linear trend.

13 Moving Avg. (MA) Processes

13.1 Moving Avg. Processes of Order \(1\) - MA(1) Processes

Let \(\left(X_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{X}\) be a stochastic process on a probability space \(\Omega\) with states in \(\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\).

Definition 13.1 (MA(1) Processes) We say that \(\mathbf{X}\) is an \(N\)-variate moving average process of order \(1\) if there exist a vector \(\mu\in\mathbb{R}^{N}\), a matrix \(\Theta\in\mathbb{R}^{N}\times\mathbb{R}^{N}\), and an \(N\)-variate strong white noise \(\left(W_{t}\right)_{t\in\mathbb{N}_{0}}\equiv\mathbf{W}\) on \(\Omega\) with variance covariance matrix \(\Sigma_{\mathbf{W}}^{2}\), for some positive definite symmetric \(\Sigma_{\mathbf{W}}^{2}\) in \(\mathbb{R}^{N}\times\mathbb{R}^{N}\), such that the random variables in \(\mathbf{X}\) satisfy the equation \[\begin{equation} X_{t}=\mu + W_{t}-\Theta W_{t-1}, \tag{13.1} \end{equation}\] for every \(t\in\mathbb{N}\). The vector \(\mu\) [resp. the matrix \(\Theta\)] is referred to as the mean [resp. the memory weight] parameter of the process \(\mathbf{X}\); the strong white noise \(\mathbf{W}\) is referred to as the state innovation of the moving avearge process \(\mathbf{X}\). When we want to stress that \(\mu=0\) the moving average process \(\mathbf{X}\) is said to be demeaned. Note that the explicit reference to the state innovation \(\mathbf{W}\) is often omitted. Note also that when \(\Theta\equiv 0\), Equation (13.1) characterizes a white noise with drift.

To denote that \(\mathbf{X}\) is moving average process of order \(1\) with states in \(\mathbb{R}^N\) we write \(\mathbf{X}\sim MA(1)^{N}\). In case \(N=1\), we usually speak of real moving average process of order \(1\), neglect \(N\), and write \(\theta\) for the memory weight parameter rather than \(\Theta\).

In some circumstances, it is more appropriate to consider a moving average process \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{X}\) and a state innovation \(\left(W_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{W}\) with time set \(\mathbb{T}\equiv\mathbb{Z}\). In this case, we say that \(\mathbf{X}\) is a moving average process of the order \(1\) if the random variables in \(\mathbf{X}\) satisfy Equation (13.1), for every \(t\in\mathbb{Z}\).

Unless otherwise specified, in what follows we deal with \(MA\left(1\right)\) processes with \(\mathbf{X}\) with non zero mean satisfying Equation (13.2), for which the state innovation \(\mathbf{W}\) is a real strong white noise with variance \(\sigma_{\mathbf{W}}^{2}\), for some \(\sigma_{\mathbf{W}}>0\), and the parameters \(\mu\) and \(\theta\) are both real numbers. Recall that in this case the autocovariance and autocorrelation functions of \(\mathbf{X}\) are symmetric, that is \[\begin{equation} \gamma_{\mathbf{X}}\left(s,t\right)=\gamma_{\mathbf{X}}\left(t,s\right) \quad\text{and}\quad \rho_{\mathbf{X}}\left(s,t\right)=\rho_{\mathbf{X}}\left(t,s\right), \end{equation}\] for all \(s,t\in\mathbb{N}\). Furthermore, we will consider the technical assumption \(\theta\neq 0\) to distinguish a moving average process of order \(1\) from a white noise.

The following Proposition, though rather trivial, is useful to stress the role of \(MA(1)\) processes as noises. It should be confronted with the analogous less trivial claim concerning \(AR(1)\) processes (see Proposition 12.1).

Proposition 13.1 (MA(1) Processes as noises) Assume that \(\left(X_{t}\right)_{t\in\mathbb{N}}\equiv \mathbf{X}\) is an \(MA(1)\) process with state innovation \(\left(W_{t}\right)_{t\in\mathbb{N}_{0}}\equiv\mathbf{W}\) satisfying Equation (13.1), for some \(\mu,\theta\in\mathbb{R}\). Then we can write \[\begin{equation} X_{t}=\mu+Y_{t}, \tag{13.2} \end{equation}\] for every \(t\in\mathbb{N}\), where \(\left(Y_{t}\right)_{t\in\mathbb{N}_{0}}\equiv \mathbf{Y}\) is a demeaned \(MA(1)\) process with state innovation \(\mathbf{W}\) solution of \[\begin{equation} Y_{t}=W_{t}-\theta W_{t-1} \tag{13.3} \end{equation}\] for every \(t\in\mathbb{N}\).

Let \(\left(X_{t}\right)_{t\in\mathbb{N}}\equiv \mathbf{X}\) be an \(MA(1)\) process satisfying Equation (13.1), for some \(\mu,\theta\in\mathbb{R}\) and some state innovation \(\left(W_{t}\right)_{t\in\mathbb{N}_{0}}\equiv\mathbf{W}\)

Proposition 13.2 (Representation of MA(1) processes) We have \[\begin{equation} X_{t}=\mu\sum_{s=0}^{t-1}\theta^{s}-\sum_{s=1}^{t-1}\theta^{t-s}X_{s}+W_{t}-\theta^{t}W_{0}, \tag{13.4} \end{equation}\] for every \(t\in\mathbb{N}\) such that \(t\geq 2\).

Proof. Fom Equation (13.1), we have \[\begin{equation} W_{1}=-\mu+X_{1}+\theta W_{0}. \tag{13.5} \end{equation}\] Therefore, considering (13.1) referred to \(t=2\), we can write \[\begin{equation} X_{2}=\mu+W_{2}-\theta\left(-\mu+X_{1}+\theta W_{0}\right) =\mu\left(1+\theta\right)-\theta X_{1}+W_{2}-\theta^{2}W_{0}, \end{equation}\] which is Equation (13.4) for \(t=2\). Now, assume inductively that (13.4) holds true for some \(t>2\) and consider the case \(t+1\). From (13.4), we obtain \[\begin{equation} W_{t}=X_{t}-\mu\sum_{s=0}^{t-1}\theta^{s}+\sum_{s=1}^{t-1}\theta^{t-1-s}X_{s}+\theta^{t}W_{0}. \tag{13.6} \end{equation}\] Hence, considering (13.1) referred to \(t+1\) and (13.6), it follows \[\begin{align} X_{t+1}&=\mu+W_{t+1}-\theta W_{t}\\ & =\mu +W_{t+1}-\theta \left( X_{t}-\mu \sum_{s=0}^{t-1}\theta^{s} +\sum_{s=1}^{t-1}\theta ^{t-s}X_{s}+\theta ^{t}W_{0}\right) \\ & =\mu +\mu \sum_{s=0}^{t-1}\theta ^{s+1}-\theta X_{t} -\sum_{s=1}^{t-1}\theta ^{t+1-s}X_{s}+W_{t+1}-\theta^{t+1}W_{0} \\ & =\mu \sum_{s=0}^{t}\theta ^{s}-\sum_{s=1}^{t}\theta^{t+1-s}X_{s}+W_{t+1}-\theta^{t+1}W_{0}. \end{align}\] The latter is the desired Equation (13.4) in case \(t+1\). This proves that Equation (13.4) holds true for every \(t\in\mathbb{N}\), \(t\geq2\).

The following corollaries are immediate consequence of Equation (13.4).

Corollary 13.1 (MA(1) Processes as an adapted processes) Let \(\left(\mathcal{F}_{t}^{\mathbf{W}}\right)_{t\in\mathbb{N_{0}}}\equiv\mathfrak{F}^{\mathbf{W}}\) the filtration generated by the innovation process \(\mathbf{W}\), that is \(\mathcal{F}_{t}^{\mathbf{W}}\equiv\sigma\left(W_{0},W_{1},\dots,W_{t}\right)\), for every \(t\in\mathbb{N_{0}}\). Then the \(MA\left(1\right)\) process \(\mathbf{X}\) is adapted to \(\mathfrak{F}^{\mathbf{W}}\).

Corollary 13.2 (MA(1) Processes as a Kth-order processes) If the white noise \(\mathbf{W}\) is a \(K\)th-order process, for some \(K\geq2\), then the \(AR\left(1\right)\) process \(\mathbf{W}\) is also a \(K\)th-order process.

Corollary 13.3 (MA(1) processes as a Markov processes) The \(MA\left(1\right)\) process \(X\) is not a Markov process.

Corollary 13.4 (Independence of the random variables in MA(1) processes from state innovations) The random variables \(X_{1},\dots,X_{t}\) in the \(MA\left(1\right)\) process \(\mathbf{X}\) are independent of the random variables \(W_{t+1},W_{t+2},\dots\) of the state innovation process \(\mathbf{W}\) for every \(t\in \mathbb{N}\).

Proof. The claim is a consequence of Equation (\(\ref{MA(1)-charact.-Prop.-Equ.}\)).

Proposition 13.3 (Mean function of MA(1) processes) The mean function \(\mu_{\mathbf{X}}:\mathbb{N}\rightarrow\mathbb{R}\) of an \(MA(1)\) process \(\mathbf{X}\) is given by \[\begin{equation} \mu_{\mathbf{X}}\left(t\right)=\mu, \tag{13.7} \end{equation}\] for every \(t\in\mathbb{N}\).

Proof. Since \(\mathbf{W}\) is a strong white noise, considering Equation (13.1), thanks to the properties of the expectation operator, we have \[\begin{equation} \mathbf{E}\left[X_{t}\right]=\mathbf{E}\left[\mu+W_{t}-\theta W_{t-1}\right] =\mu+\mathbf{E}\left[W_{t}\right]-\theta\mathbf{E}\left[W_{t-1}\right]=\mu, \end{equation}\] for every \(t\in\mathbb{N}\).

Proposition 13.4 (Variance function of MA(1) processes) The variance function \(\sigma_{\mathbf{X}}^{2}:\mathbb{N}\rightarrow\mathbb{R}\) an \(MA(1)\) process \(\mathbf{X}\) is given by \[\begin{equation} \sigma_{\mathbf{X}}^{2}\left(t\right)=\left(1+\theta^{2}\right)\sigma_{\mathbf{W}}^{2}, \tag{13.8} \end{equation}\] for every \(t\in\mathbb{N}\).

Proof. Since \(\mathbf{W}\) is a strong white noise, considering Equation (13.1), thanks to the properties of the variance operator, we have \[\begin{equation} \mathbf{D}^{2}\left[X_{t}\right]=\mathbf{D}^{2}\left[\mu+W_{t}-\theta W_{t-1}\right] =\mathbf{D}^{2}\left[W_{t}\right]+\theta^{2}\mathbf{D}^{2}\left[W_{t-1}\right] =\left(1+\theta^{2}\right)\sigma_{\mathbf{W}}^{2}, \end{equation}\] for every \(t\in\mathbb{N}\).

Proposition 13.5 (Autocovariance function of MA(1) processes) The autocovariance function \(\gamma_{\mathbf{X}}:\mathbb{N}\times\mathbb{N}\rightarrow\mathbb{R}\) of an \(MA(1)\) process \(\mathbf{X}\) is given by \[\begin{equation} \gamma_{\mathbf{X}}\left(s,t\right)=\left\{ \begin{array} [c]{ll} \left(1+\theta^{2}\right)\sigma_{\mathbf{W}}^{2}, & \text{if }t-s=0,\\ -\theta\sigma_{\mathbf{W}}^{2}, & \text{if }\left\vert t-s\right\vert =1,\\ 0, & \text{if }\left\vert t-s\right\vert >1. \end{array} \right. \tag{13.9} \end{equation}\]

Proof. Considering Equation (13.1), thanks to the properties of the covariance functional, we have \[\begin{align} \gamma_{\mathbf{X}}\left(s,t\right) & =Cov\left(X_{s},X_{t}\right) =Cov\left(\mu+W_{s}-\theta W_{s-1},\mu+W_{t}-\theta W_{t-1}\right) \\ & =Cov\left(W_{s},W_{t}\right)-\theta Cov\left(W_{s},W_{t-1}\right) -\theta Cov\left(W_{s-1},W_{t}\right)+\theta^{2}Cov\left(W_{s-1},W_{t-1}\right). \end{align}\] Now, since \(\mathbf{W}\sim SWN\left(\sigma_{\mathbf{W}}^{2}\right)\), fo some \(\sigma_{\mathbf{W}}>0\), we have \[\begin{equation} Cov\left(W_{s},W_{t}\right)=\left\{ \begin{array} [c]{ll} \sigma_{\mathbf{W}}^{2}, & \text{if }t=s,\\ 0, & \text{if }t\neq s. \end{array} \right. \end{equation}\] It follows \[\begin{equation} \gamma_{\mathbf{X}}\left(s,t\right)=\left\{ \begin{array} [c]{ll} \left(1+\theta^{2}\right)\sigma_{\mathbf{W}}^{2}, & \text{if }t=s,\\ -\theta\sigma_{\mathbf{W}}^{2}, & \text{if }t=s+1\ \text{or }t=s-1,\\ 0, & \text{if }t>s+1 \text{or }t<s-1 \end{array} \right. \end{equation}\] The latter clearly implies (13.9).

Proposition 13.6 (Autocorrelation function of MA(1) processes) The autocorrelation function \(\rho_{\mathbf{X}}:\mathbb{N\times N}\rightarrow\mathbb{R}\) of an \(MA(1)\) process \(\mathbf{X}\) is given by \[\begin{equation} \rho_{\mathbf{X}}\left(s,t\right)=\left\{ \begin{array} [c]{ll} 1, & \text{if }t-s=0\\ -\frac{\theta}{1+\theta^{2}}, & \text{if }\left\vert t-s\right\vert =1,\\ 0, & \text{if }\left\vert t-s\right\vert >1. \end{array} \right. \tag{13.10} \end{equation}\]

Proof. It clearly follows combining (13.8) and (13.9).

Corollary 13.5 (Weak stationarity for MA(1) processes) An \(MA\left(1\right)\) process \(\mathbf{X}\) is weak sense stationary. In particular, we can consider the reduced autocovariance and autocorrelation functions of the process \(\mathbf{X}\) referred to \(0\), which are given by \[\begin{equation} \gamma_{\mathbf{X},0}\left(t\right)=\left\{ \begin{array} [c]{ll} \left(1+\theta^{2}\right)\sigma_{\mathbf{W}^{2}}, & \text{if }t=0,\\ -\theta\sigma_{\mathbf{W}^{2}}, & \text{if }t=1,\\ 0, & \text{if }t>1, \end{array} \right. \tag{13.11} \end{equation}\] and \[\begin{equation} \rho_{\mathbf{X},0}\left(t\right)=\left\{ \begin{array} [c]{ll} 1, & \text{if }t=0,\\ -\frac{\theta}{1+\theta^{2}}, & \text{if }t=1,\\ 0, & \text{if }t>1. \end{array} \right. \tag{13.12} \end{equation}\]

Proof. Considering Definition 7.1, the weak stationarity is an immediate consequence of Equations (13.7) and (13.9).

Corollary 13.6 (Yule-Walker equation for MA(1) processes) We have \[\begin{equation} \gamma_{\mathbf{X}}\left(t,t-1\right)=-\frac{\theta}{1+\theta^{2}}\gamma_{\mathbf{X}}\left(t,t\right), \tag{13.13} \end{equation}\] for every \(t\in\mathbb{N}\).

Proof. Equation (13.13) is an immediate consequence of Equation (13.9).

Remark (**Uncorrelation of random variables in MA(1) Processes**). The random variables \(X_{s}\) and \(X_{t}\) in \(\mathbf{X}\) are uncorrelated for all \(s,t\in\mathbb{Z}\) such that \(\left\vert t-s\right\vert >1\).

Proposition 13.7 (Ergodicity for MA(1) processes) An \(MA\left(1\right)\) process \(\mathbf{X}\) is mean square ergodic in the mean. Moreover, if the state innovation \(\mathbf{W}\) is a \(4\)th order process then \(\mathbf{X}\) is mean square ergodic in the wide sense.

Proof. The mean square ergodicity in the mean follows from Slutsky theorem. With regard to the wide sense ergodicity, observe that we can write \[\begin{equation} X_{1}X_{s}=X_{1}\left(\mu+W_{s}-\theta W_{s-1}\right)=\mu X_{1}+X_{1}W_{s}-\theta X_{1}W_{s-1} \end{equation}\] and \[\begin{align} X_{t}X_{s+t} & =\left(\mu+W_{t}-\theta W_{t-1}\right)\left(\mu+W_{s+t}-\theta W_{s+t-1}\right) \\ & =\mu^{2}+\mu\left(W_{t}+W_{s+t}\right)-\mu\theta\left(W_{t-1}+W_{s+t-1}\right) -\theta\left(W_{t-1}W_{s+t}+W_{t}W_{s+t-1}\right)+W_{t}W_{s+t}+\theta^{2}W_{t-1}W_{s+t-1}. \end{align}\] As a consequence, considering Corollary …, we have \[\begin{align} & Cov\left(X_{1}X_{k},X_{t}X_{s+t}\right) \\ & =\mu^{2}\left(Cov\left(X_{1},W_{t}\right)+Cov\left(X_{1},W_{s+t}\right)\right) +\mu^{2}\theta\left(Cov\left(X_{1},W_{t-1}\right)+Cov\left(X_{1},W_{s+t-1}\right)\right) \\ & +\theta\left(Cov\left(X_{1},W_{t-1}W_{s+t}\right)+ Cov\left(X_{1},W_{t}W_{s+t-1}\right)\right)+Cov\left(X_{1},W_{t}W_{s+t}\right) +\theta^{2}Cov\left(X_{1},W_{t-1}W_{s+t-1}\right) \\ & =0 \end{align}\] for all \(s,t\in\mathbb{N}\) such that \(t>1\). By virtue of Theorem …, the desired result follows.

Proposition 13.8 (Gaussian MA(1) processes) Assume that the state innovation process \(\mathbf{W}\) is Gaussian, in symbols \(W\sim GWN(\sigma_{\mathbf{W}}^{2})\), for some \(\sigma_{\mathbf{W}}>0\). Then we have \[\begin{equation} X_{t}\sim N\left(\mu_{\mathbf{X}}\left(t\right),\sigma_{\mathbf{X}}^{2}\left(t\right)\right), \end{equation}\] for every \(t\in\mathbb{N}\), where \(\mu_{\mathbf{X}}\left(t\right)\) and \(\sigma_{\mathbf{X}}^{2}\left(t\right)\) are given by (13.7) and (13.8), respectively. In addition, the process \(\mathbf{X}\) is Gaussian.

Proof. Thanks to the independence and Gaussianity of the random variables on the right hand side of Equation (\(\ref{MA(1)-Def.-Equ.}\)), Equation (\(\ref{MA(1)-Gaussian-distrib.-Prop.-Equ.}\)) immediately follows. Moreover, referring Equation (\(\ref{MA(1)-Def.-Equ.}\)) to the standardized random variables in \(W\) and considering Proposition \(\ref{Gaussian-processes-equivalent-cond.-Prop.-01}\), we obtain also the Gaussianity of the process \(\left(X_{t}\right)_{t\in\mathbb{N}}\). In the end, the Gaussianity of the increments of \(X\) comes from Proposition \(\ref{Gaussian-process-increment-Prop.}\).

Definition 13.2 (Gaussian MA(1) processes) In light of Proposition 13.8, we call Gaussian an \(MA(1)\) process \(\mathbf{X}\) with Gaussian state innovation \(\mathbf{W}\).

13.1.1 Parameter Estimation

Let \(\left(x_{t}\right)_{t=1}^{T}\equiv\mathbf{x}\) be a univariate real time series, for some \(T\geq2\), and let \(\left(X_{t}\right)_{t\in\mathbb{N}_{0}}\equiv\mathbf{X}\) be an \(MA\left(1\right)\) process, which satisfies Equation (11.2), for some mean parameter \(\mu\in\mathbb{R}\), some memory parameter \(\theta\in\mathbb{R}\), and a state innovation process \(\left(W_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{W}\). In symbols, \(\mathbf{W}\sim SWN\left(\sigma_{\mathbf{W}}^{2}\right)\) for some standard deviation parameter \(\sigma_{\mathbf{W}}>0\).

We assume that the \(MA\left(1\right)\) process \(\mathbf{X}\) is a model of \(\mathbf{x}\) for suitable values of the parameters \(\mu\). \(\theta\), and \(\sigma_{\mathbf{W}}\). The goal is to determine estimates \(\hat{\mu}_{T}\left(\omega\right)\), \(\hat{\theta}_{T}\left(\omega\right)\) and \(\hat{\sigma}_{\mathbf{W},{T}}\left(\omega\right)\) of the parameters \(\mu\), \(\theta\), and \(\sigma_{\mathbf{W}}\), respectively, which allow the best fit of \(\mathbf{X}\) to the time series \(\mathbf{x}\).

From Equations (13.7), (13.8), and (13.10), we know that \[\begin{equation} \mu_{\mathbf{X}}\left(t\right)=\mu, \qquad \sigma_{\mathbf{X}}^{2}\left(t\right)=\left(1+\theta^{2}\right)\sigma_{\mathbf{W}}^{2}, \end{equation}\] for every \(t\in\mathbb{N}\), and \[\begin{equation} \rho_{X,0}\left(1\right)=-\frac{\theta}{1+\theta^{2}}. \end{equation}\] The application of the method of moments, requires to set \[\begin{equation} \hat{\mu}_{T}=\bar{X}_{T}\equiv\frac{1}{T}\sum\limits_{t=1}^{T}X_{t}, \tag{13.14} \end{equation}\] \[\begin{equation} \left(1+\hat{\theta}^{2}_{T}\right)\hat{\sigma}_{\mathbf{W},T}^{2} =S_{\mathbf{X},T}^{2}\equiv\frac{1}{T}\sum\limits_{t=1}^{T}\left(X_{t}-\bar{X}_{T}\right)^{2}, \tag{13.15} \end{equation}\] \[\begin{equation} -\frac{\hat{\theta}_{T}}{1+\hat{\theta}^{2}_{T}}=R_{\mathbf{X},T}\left(1\right) \equiv\frac{G_{\mathbf{X},T}\left(1\right)}{G_{\mathbf{X},T}\left(0\right)} =\frac{G_{\mathbf{X},T}\left(1\right)}{S_{\mathbf{X},T}^{2}} =\frac{\sum\limits_{t=1}^{T-1}\left(X_{t}-\bar{X}_{T}\right) \left(X_{t+1}-\bar{X}_{T}\right)^{\intercal}} {\sum\limits_{t=1}^{T}\left(X_{t}-\bar{X}_{T}\right)^{2}}. \tag{13.16} \end{equation}\] and thereby solve Equations (13.14)-(13.16) to obtain estimators \(\hat{\mu}_{T}\), \(\hat{\theta}_{T}\), and \(\hat{\sigma}_{\mathbf{W},T}^{2}\) for the parameters \(\mu\), \(\theta\), and \(\sigma_{\mathbf{W}}^{2}\). Note that Equation (13.16) is the Yule Walker Equation.

Proposition 13.9 (MM parameter estimation for MA(1) processes) Assume that the MA(1) processes \(\mathbb{X}\) is ergodic in the wide sense. Then the \(MM\) estimates of the parameters \(\mu\), \(\theta\), and \(\sigma_{\mathbf{W}}\) are given by the solution of the equations \[\begin{equation} \hat{\mu}_{T}\left(\omega\right)=\bar{\mathbb{x}}_{T}, \qquad \left(1+\hat{\theta}_{T}^{2}\left(\omega\right)\right)\hat{\sigma}_{\mathbf{W},T}^{2}\left(\omega\right) =s_{\mathbf{X},T}^{2}, \quad\text{and}\quad -\frac{\hat{\theta}_{T}\left(\omega\right)}{1+\hat{\theta}_{T}^{2}\left(\omega\right)} =r_{\mathbf{X},T}\left(1\right), \tag{13.17} \end{equation}\] where \[\begin{equation} \bar{x}_{T}\equiv\frac{1}{T}\sum\limits_{t=1}^{T}x_{t}, \qquad s^{2}_{\mathbf{X},T}\equiv\frac{1}{T}\sum\limits_{t=1}^{T}\left(x_{t}-\bar{x}_{T}\right)^{2}, \quad\text{and}\quad r_{\mathbf{X},T}\left(1\right) =\frac{\sum\limits_{t=1}^{T-1}\left(x_{t}-\bar{x}_{T}\right)\left(x_{t+1}-\bar{x}_{T}\right)^{\intercal}} {\sum\limits_{t=1}^{T}\left(x_{t}-\bar{x}_{T}\right)^{2}} \end{equation}\] are the realizations of the time average estimator \(\bar{X}_{T}\), the time variance estimator \(S^{2}_{\mathbf{X},T}\), and the time autocorrelation estimator \(R_{\mathbf{X},T}\left(1\right)\), respectively.

13.1.2 Prediction of Future States and prediction Intervals

Let \(\left(x_{t}\right)_{t=1}^{T}\equiv\mathbf{x}\) be a univariate real time series, for some \(T\geq2\), and let \(\left(X_{t}\right)_{t\in\mathbb{N}_{0}}\equiv\mathbf{X}\) be a wide sense ergodic \(MA\left(1\right)\) process which satisfies Equation (11.2), for some mean parameter \(\mu\in\mathbb{R}\), some memory parameter \(\theta\in\mathbb{R}\), and a state innovation process \(\left(W_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{W}\). In symbols, \(\mathbf{W}\sim SWN\left(\sigma_{\mathbf{W}}^{2}\right)\) for some standard deviation parameter \(\sigma_{\mathbf{W}}>0\). Assume to have determined the estimates \(\hat{\mu}\left(\omega\right)\), \(\hat{\theta}\left(\omega\right)\), and \(\hat{\sigma}_{\mathbf{W}}\left(\omega\right)\) of the parameters \(\mu\), \(\theta\), and \(\sigma_{\mathbf{W}}\), respectively, which allow the best fit of the \(MA\left(1\right)\) process \(\mathbf{X}\) to the time series \(\mathbf{x}\) with some estimation method. Let \(\left(\mathcal{F}_{t}^{\mathbf{W}}\right)_{t\in\mathbb{N}_{0}}\equiv\mathfrak{F}^{\mathbf{W}}\) be the filtration generated by the innovation process \(\mathbf{W}\). For any \(S,T\in \mathbb{N}\), write \(X_{T+S}\) for the \(S\)th future state of the process \(\mathbf{X}\) with respect to the current state \(X_{T}\) and write \(\hat{X}_{T+S\mid T}\) for the minimum square error predictor of the \(S\)th future state of the process \(\mathbf{X}\), given the information represented by \(\mathcal{F}_{T}^{\mathbf{W}}\). Formally, \[\begin{equation} \hat{X}_{T+S\mid T}=\underset{Y\in L^{2}\left(\Omega_{\mathcal{F}_{T}^{\mathbf{W}}};\mathbb{R}\right)} {\arg\min}\mathbf{E}\left[Y-X_{T+S}\right]^{2}, \tag{13.18} \end{equation}\] where \(L^{2}\left(\Omega_{\mathcal{F}_{T}^{\mathbf{W}}},\mathbb{R}\right)\) is the Hilbert space of the random variables which are measurable with respect to the \(\sigma\)-algebra \(\mathcal{F}_{T}^{X_{0},\mathbf{W}}\) and have finite moment of order \(2\). As a consequence of Equation (13.18), we have \[\begin{equation} \hat{X}_{T+S\mid T}=\mathbf{E}\left[X_{T+S}\mid\mathcal{F}_{T}^{\mathbf{W}}\right]. \tag{13.19} \end{equation}\] It is also well known that \[\begin{equation} \hat{X}_{T+S\mid T}=h\left(W_{1},\dots,W_{T}\right), \tag{13.20} \end{equation}\] where \(h:\mathbb{R}^{T}\rightarrow\mathbb{R}\) is a function such that \[\begin{equation} h\left(\cdot,\dots,\cdot\right)= \underset{g:\mathbb{R}^{T}\rightarrow\mathbb{R}\text{ s.t. } g\left(W_{1},\dots,W{T}\right)\in L^{2}\left(\Omega_{\mathcal{F}_{T}^{\mathbf{W}}};\mathbb{R}\right)} {\arg\min}\mathbf{E}\left[g\left(W_{1},\dots,W_{T}\right)-X_{T+S}\right]^{2}. \tag{13.21} \end{equation}\] Moreover, two functions \(h_{1}\) and \(h_{2}\) solutions of the maximization problem (12.48) can differ only on a subset of \(\mathbb{R}^{T}\) with zero Lebesgue measure.

For reader’s convenience, we recall again the important results mentioned in the context of the \(AR(1)\) processes, which depend only on the properties of the conditional expectation operator on the Hilbert space of the random variables with finite moment of order \(2\) and are independent on the \(MA(1)\) structure of the process \(X\).

Proposition 13.10 (**Characterizaton of predictor**) We have \[\begin{equation} \mathbf{E}\left[X_{T+k\mid T}\right]=\mathbf{E}\left[X_{T+k}\right] \tag{13.22} \end{equation}\] and \[\begin{equation} Cov\left(X_{t+k}-X_{T+k\mid T},X_{T+k\mid T}\right)=0, \tag{13.23} \end{equation}\] for every \(k\in\mathbb{N}\), or, equivalently, \[\begin{equation} \mathbf{E}\left[\left(X_{t+k}-X_{T+k\mid T}\right)X_{T+k\mid T}\right]=0, \tag{13.24} \end{equation}\] for every \(k\in\mathbb{N}\).

Corollary 13.7 (Characterizaton of predictor) We have \[\begin{equation} \mathbf{E}\left[X_{T+k\mid T}^{2}\right]=\mathbf{E}\left[X_{t+k}X_{T+k\mid T}\right] \tag{13.25} \end{equation}\] and \[\begin{equation} \mathbf{D}^{2}\left[X_{T+k\mid T}\right]=Cov\left(X_{T+k},X_{T+k\mid T}\right), \tag{13.26} \end{equation}\] for every \(k\in\mathbb{N}\).

For reader’s convenience, we recall also that

Definition 13.3 (Characterizaton of prediction error) We call prediction error of the predictor \(X_{T+k\mid T}\) of the state \(X_{T+k}\) the random variable \[\begin{equation} E_{T+k\mid T}\overset{\text{def}}{=}X_{T+k}-X_{T+k\mid T},\quad\forall k\in\mathbb{N}. \tag{13.27} \end{equation}\] We call mean squared error of the predictor \(X_{T+k\mid T}\) of the state \(X_{T+k}\), the positive number \[\begin{equation} \mathbf{MSE}\left[E_{T+k\mid T}\right]\overset{\text{def}}{=}\mathbf{E}\left[E_{T+k\mid T}^{2}\right],\quad\forall k\in\mathbb{N}. \tag{13.28} \end{equation}\]

Remark (**Characterizaton of prediction error**). We have \[\begin{equation} \mathbf{E}\left[E_{T+k\mid T}\right]=0, \tag{13.29} \end{equation}\] for every \(k\in\mathbb{N}\). Hence, \[\begin{equation} \mathbf{MSE}\left[E_{T+k\mid T}\right]=\mathbf{D}^{2}\left[E_{T+k\mid T}\right], \tag{13.30} \end{equation}\] for every \(k\in\mathbb{N}\).

Proposition 13.11 (Characterizaton of prediction error) We have \[\begin{equation} \mathbf{D}^{2}\left[E_{T+k\mid T}\right]=\mathbf{D}^{2}\left[X_{T+k}\right]-\mathbf{D}^{2}\left[X_{T+k\mid T}\right], \tag{13.31} \end{equation}\] for every \(k\in\mathbb{N}\).

Proof. Since the result does not depend on the \(MA\left(1\right)\) structure, the proof is the same as the proof of Proposition 12.17

Now, we turn again our attention to \(MA\left(1\right)\) processes.

Proposition 13.12 (Characterizaton of MA(1) predictor) We have \[\begin{equation} X_{T+k\mid T}=\left\{ \begin{array} [c]{ll} \mu-\theta W_{T}, & \text{if }k=1,\\ \mu, & \text{if }k>1. \end{array} \right. \tag{13.32} \end{equation}\] As a consequence, \[\begin{equation} \mathbf{D}^{2}\left[X_{T+k\mid T}\right]=\left\{ \begin{array} [c]{ll} \theta^{2}\sigma_{\mathbf{W}}^{2}, & \text{if }k=1,\\ 0, & \text{if }k>1. \end{array} \right. \tag{13.33} \end{equation}\]

Proof. Thanks to Equation (13.1) referred to \(T+k\), we can write \[\begin{equation} X_{T+k}=\mu+W_{T+k}-\theta W_{T+k-1} \end{equation}\] Therefore, by virtue of the properties of the conditional expectation, we obtain \[\begin{align} X_{T+k,T} & =\mathbf{E}\left[\mu+W_{T+k}-\theta W_{T+k-1}\mid\mathcal{F}_{T}^{W}\right]\\ & =\mu+\mathbf{E}\left[W_{T+k}\mid\mathcal{F}_{T}^{W}\right] -\theta\mathbf{E}\left[W_{T+k-1}\mid\mathcal{F}_{T}^{W}\right], \tag{13.34} \end{align}\] where, considering that \(W\sim SWN\left(\sigma_{\mathbf{W}}^{2}\right)\), \[\begin{equation} \mathbf{E}\left[W_{T+k}\mid\mathcal{F}_{T}^{W}\right]=\mathbf{E}\left[W_{T+k}\right]=0, \text{ for every }k\geq 1 \quad\text{and}\quad \mathbf{E}\left[W_{T+k-1}\mid\mathcal{F}_{T}^{W}\right]=\left\{ \begin{array} [c]{ll} W_{T}, & \text{if }k=1,\\ \mathbf{E}\left[W_{T+k-1}\right], & \text{if }k>1. \end{array} \right. \tag{13.35} \end{equation}\] Combining (13.34) and (13.35) the desired Equation (13.32) follows. Now, still considering Equation (13.1) referred to \(T+k\) and (13.26), we have \[\begin{equation} \mathbf{D}^{2}\left[ X_{T+k,T}\right] =\left\{ \begin{array} [c]{ll}% Cov\left(\mu+W_{T+1}-\theta W_{T},\mu-\theta W_{T}\right), & \text{if }k=1,\\ Cov\left(\mu+W_{T+k+1}-\theta W_{T+1},\mu\right), & \text{if }k>1 \end{array} \right. \tag{13.36} \end{equation}\] where, \[\begin{equation} Cov\left(\mu+W_{T+k}-\theta W_{T},\mu-\theta W_{T}\right)=\theta^{2}\sigma_{\mathbf{W}}^{2} \quad\text{and}\quad Cov\left(\mu+W_{T+k+1}-\theta W_{T+1},\mu\right)=0. \tag{13.37} \end{equation}\] \end{equation} From (13.36) and (13.37), we obtain Equation (13.33).

Corollary 13.8 (Characterizaton of MA(1) prediction error) We have \[\begin{equation} \mathbf{D}^{2}\left[E_{T+k\mid T}\right]=\left\{ \begin{array} [c]{ll} \sigma_{\mathbf{W}}^{2}, & \text{if }k=1,\\ \left(1+\theta^{2}\right)\sigma_{\mathbf{W}}^{2}, & \text{if }k>1. \end{array} \right. \tag{13.38} \end{equation}\]

Proof. Replacing both Equations (13.8) and (13.33) into Equation (13.31), the desired result immediately follows.

Proposition 13.13 (Characterizaton of MA(1) prediction bands) Assume that the innovation \(W\) is a Gaussian white noise, in symbols \(W\sim GWN\left(\sigma_{\mathbf{W}}^{2}\right)\). Then also the predictor \(X_{T+k\mid T}\) of the state \(X_{T+k}\) is Gaussian for every \(k\in\mathbb{N}\). Therefore, a prediction interval for the state \(X_{T+S}\), for any \(S\in\mathbb{N}\), at the confidence level of \(100\left(1-\alpha\right)\%\), for any \(\alpha\in\left(0,1\right)\), is given by \[\begin{equation} \left(X_{T+1\mid T}-z_{\alpha/2}\left\vert\theta\right\vert\sigma_{\mathbf{W}},\ X_{T+1\mid T}+z_{\alpha/2}\left\vert\theta\right\vert\sigma_{\mathbf{W}}\right), \tag{13.39} \end{equation}\] where \(z_{\alpha/2}\equiv z_{\alpha/2}^{+}\) is the upper tail critical value of level \(\alpha/2\) of the standard Gaussian random variable. The realization of the predicton interval is then \[\begin{equation} \left(x_{T+1\mid T}-z_{\alpha/2}\left\vert\hat{\theta}\left(\omega\right)\right\vert\hat{\sigma}_{\mathbf{W}}\left(\omega\right), \ x_{T+1\mid T}+z_{\alpha/2}\left\vert\hat{\theta}\left(\omega\right)\right\vert\hat{\sigma}_{\mathbf{W}}\left(\omega\right)\right), \tag{13.40} \end{equation}\] where \(x_{T+k\mid T}\) is the realization of the estimator \(X_{T+k\mid T}\) of the state \(X_{T+k}\) and \(\hat{\theta}\left(\omega\right)\) [resp. \(\hat{\sigma}_{\mathbf{W}}^{2}\left(\omega\right)\)] is the estimated value of the parameter \(\theta\) [resp. \(\sigma^{2}\)].

13.1.3 Examples

We build an \(MA(1)\) process and consider some sample paths

t <- seq(from=-0.49, to=1.00, length.out=150)     # Choosing the time set.
a <- 0.5                                          # Choosing the mean coefficient. 
b <- 5.0                                          # Choosing the linear trend coefficient.
g <- 0.7                                          # Choosing the memory coefficient.
set.seed(12345, kind=NULL, normal.kind=NULL)      # Setting a random seed for reproducibility.
Ext_Gauss_r <- rnorm(n=151, mean=0, sd=9)         # Determining one of the possible values of the Gaussian 
Gauss_r_0 <- Ext_Gauss_r[1]                       # random variables in the state innovation process. 
Gauss_r <- Ext_Gauss_r[-1]
                                                  # Showing the values taken by the Gaussian random variables
                                                  # in the state innovation process. 
head(Gauss_r)                                     # Initial part of the sample path of the state innovation.
## [1]   6.3851942  -0.9837298  -4.0814746   5.4529871 -16.3616037   5.6708870
tail(Gauss_r)                                     # Final part of the sample path of the state innovation.
## [1]   4.861526 -13.925628   7.646876   8.064119   1.248219 -14.573955
x_r <- rep(NA,150)                                # Setting an empty vector of length 150 to store 
                                                  # the sample path of the AR(1) process, corresponding to 
                                                  # the sample path of the state innovation.
x0 <- 0                                           # Choosing the starting point of the MA(1) process.
x_r[1] <- a + b*t[1] + g*Gauss_r_0 + Gauss_r[1]   # Determining the first point (after the starting point) 
                                                  # of the sample path of the MA(1) process. 
for (n in 2:150)
{x_r[n] <- a + b*t[n] + g*Gauss_r[n-1] + Gauss_r[n]}  # Determining the other points of the sample path 
                                                      # of the MA(1) process.
head(x_r)                                   # Showing the initial part of the sample path of the MA(1) process.
## [1]   8.1240257   1.5859061  -6.6200854   0.7959549 -14.2945127  -7.4822356
tail(x_r)                                   # Showing the final part of the sample path of the MA(1) process.
## [1] 10.211417 -5.222559  3.248937 18.816932 12.343102 -8.200202
set.seed(23451, kind=NULL, normal.kind=NULL)      # Setting another random seed for reproducibility 
                                                  # and building another sample path of the MA(1) process.

Ext_Gauss_b <- replace(Ext_Gauss_r, c(51:150), rnorm(n=100, mean=0, sd=9)) # Building another sample path of the  
                                                                       # Gaussian state innovation process,  
                                                                       # which retains the first 50 sample points 
                                                                       # of the former path.
Gauss_b_0 <- Ext_Gauss_b[1]
Gauss_b <- Ext_Gauss_b[-1]
head(Gauss_b)                                     # Initial part of the sample path of the state innovation.
## [1]   6.3851942  -0.9837298  -4.0814746   5.4529871 -16.3616037   5.6708870
tail(Gauss_b)                                     # Final part of the sample path of the state innovation.
## [1]   5.182616  -4.460372   1.885254   4.087216  -1.627632 -14.573955
x_b <- rep(NA,150)                                # Setting an empty vector of length 150 to store 
                                                  # the sample path of the MA(1) process, corresponding to 
                                                  # the sample path of the state innovation.
x0 <- 0                                           # Choosing the starting point of the MA(1) process.
x_b[1] <- a + b*t[1] + g*Gauss_b_0 + Gauss_b[1]   # Determining the first point (after the starting point) 
                                                  # of the sample path of the MA(1) process. 
for (n in 2:150)
{x_b[n] <- a + b*t[n] + g*Gauss_b[n-1] + Gauss_b[n]}  # Determining the other points of the sample path 
                                                      # of the MA(1) process.
head(x_b)                                             # Initial part of the sample path of the MA(1) process.
## [1]   8.1240257   1.5859061  -6.6200854   0.7959549 -14.2945127  -7.4822356
tail(x_b)                                             # Final part of the sample path of the MA(1) process.
## [1]   7.067625   4.467459   4.112993  10.806893   6.683419 -10.213297
set.seed(34512, kind=NULL, normal.kind=NULL)
Ext_Gauss_g <- replace(Ext_Gauss_r, c(51:150), rnorm(n=100, mean=0, sd=9))

Gauss_g_0 <- Ext_Gauss_g[1]
Gauss_g <- Ext_Gauss_g[-1]
head(Gauss_g)                                     # Initial part of the sample path of the state innovation.
## [1]   6.3851942  -0.9837298  -4.0814746   5.4529871 -16.3616037   5.6708870
tail(Gauss_g)                                     # Final part of the sample path of the state innovation.
## [1]   6.86117068   1.45432151  -0.06290163  -2.54515536  -7.59723922
## [6] -14.57395490
x_g <- rep(NA,150)                                # Setting an empty vector of length 150 to store 
                                                  # the sample path of the MA(1) process, corresponding to 
                                                  # the sample path of the state innovation.
x0 <- 0                                           # Choosing the starting point of the MA(1) process.
x_g[1] <- a + b*t[1] + g*Gauss_g_0 + Gauss_g[1]   # Determining the first point (after the starting point) 
                                                  # of the sample path of the MA(1) process. 
for (n in 2:150)
{x_g[n] <- a + b*t[n] + g*Gauss_g[n-1] + Gauss_g[n]}  # Determining the other points of the sample path 
                                                      # of the MA(1) process.
head(x_g)                                             # Initial part of the sample path of the MA(1) process.
## [1]   8.1240257   1.5859061  -6.6200854   0.7959549 -14.2945127  -7.4822356
tail(x_g)                                             # Final part of the sample path of the MA(1) process.
## [1]   5.985709  11.557141   6.305123   2.810813  -3.928848 -14.392022
Gauss_MA1_df <- data.frame(t,x_r,x_b,x_g)         # Generating a data frame from the time variable 
                                                  # and the three paths of the MA(1) process.
head(Gauss_MA1_df)
##       t         x_r         x_b         x_g
## 1 -0.49   8.1240257   8.1240257   8.1240257
## 2 -0.48   1.5859061   1.5859061   1.5859061
## 3 -0.47  -6.6200854  -6.6200854  -6.6200854
## 4 -0.46   0.7959549   0.7959549   0.7959549
## 5 -0.45 -14.2945127 -14.2945127 -14.2945127
## 6 -0.44  -7.4822356  -7.4822356  -7.4822356
# library(dplyr)
Gauss_MA1_df <- add_row(Gauss_MA1_df,  t=-0.50, x_r=0, x_b=0, x_g=0, .before=1) # Adding a row to represent  
                                                                                # the starting point of the MA(1)                                                                                  # process.
head(Gauss_MA1_df)
##       t         x_r         x_b         x_g
## 1 -0.50   0.0000000   0.0000000   0.0000000
## 2 -0.49   8.1240257   8.1240257   8.1240257
## 3 -0.48   1.5859061   1.5859061   1.5859061
## 4 -0.47  -6.6200854  -6.6200854  -6.6200854
## 5 -0.46   0.7959549   0.7959549   0.7959549
## 6 -0.45 -14.2945127 -14.2945127 -14.2945127
tail(Gauss_MA1_df)
##        t       x_r        x_b        x_g
## 146 0.95 10.211417   7.067625   5.985709
## 147 0.96 -5.222559   4.467459  11.557141
## 148 0.97  3.248937   4.112993   6.305123
## 149 0.98 18.816932  10.806893   2.810813
## 150 0.99 12.343102   6.683419  -3.928848
## 151 1.00 -8.200202 -10.213297 -14.392022

We plot the paths of the \(MA(1)\) process. First, the scatter Plot.

# library(ggplot2)
Data_df <- Gauss_MA1_df
First_Date <- as.character(Data_df$t[1])
Last_Date <- as.character(Data_df$t[nrow(Data_df)])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Scatter Plot of Three Paths of a Gaussian MA(1) process with Drift and Linear Trend for t = ", First_Date, " to t = ", Last_Date))
subtitle_content <- bquote(atop(paste("path length ", .(nrow(Data_df)), " sample points,    starting point ",
                                      x[0]==0, ",    drift par. ", alpha==.(a), ",  linear trend par. ", 
                                      beta==.(b), ",  memory par. ", theta==.(g),","),
                                paste("state innovation random seeds ", 12345, ", " , 23451, ", " , 34512, 
                                      ",    state innovation var. par. ", sigma^2==1,".")))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
y_name <- bquote(~ X[t] ~ "values")
x_breaks_num <- 15
x_binwidth <- round((max(Data_df$t)-min(Data_df$t))/x_breaks_num, digits=3)
x_breaks_low <- floor((min(Data_df$t)/x_binwidth))*x_binwidth
x_breaks_up <- ceiling((max(Data_df$t)/x_binwidth))*x_binwidth
x_breaks <- round(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth),3)
x_labs <- format(x_breaks, scientific=FALSE)
j <- 0
x_lims <- c(x_breaks_low-j*x_binwidth, x_breaks_up+j*x_binwidth)
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$x_r,Data_df$x_b,Data_df$x_g)-min(Data_df$x_r,Data_df$x_b,Data_df$x_g))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$x_r,Data_df$x_b,Data_df$x_g)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$x_r,Data_df$x_b,Data_df$x_g)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <-  0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
x_k_col <- bquote("random seed" ~  12345)
x_r_col <- bquote("random seed" ~  12345)
x_b_col <- bquote("random seed" ~  23451)
x_g_col <- bquote("random seed" ~  34512)
leg_labs <- c(x_k_col, x_r_col, x_b_col, x_g_col)
leg_ord <- c("x_k_col", "x_r_col", "x_b_col", "x_g_col")
leg_cols <- c("x_k_col"="black", "x_r_col"="red", "x_b_col"="blue", "x_g_col"="green")
Data_df_SP <- ggplot(Data_df, aes(x=t)) + 
  geom_hline(yintercept = 0, size=0.3, colour="black") +
  geom_vline(xintercept = 0, size=0.3, colour="black") +
  geom_point(alpha=1, size=1, aes(y=x_r, color="x_r_col")) +
  geom_point(alpha=1, size=1, aes(y=x_b, color="x_b_col")) +
  geom_point(alpha=1, size=1, aes(y=x_g, color="x_g_col")) +
  geom_point(data=subset(Data_df, Data_df$t <= 0), alpha=1, size=1, aes(y=x_r, color="x_k_col")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_ord) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(Data_df_SP)

Second, the line plot.

# library(ggplot2)
Data_df <- Gauss_MA1_df
First_Date <- as.character(Data_df$t[1])
Last_Date <- as.character(Data_df$t[nrow(Data_df)])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Line Plot of Three Paths of a Gaussian MA(1) process with Drift and Linear Trend for t = ", First_Date, " to t = ", Last_Date))
subtitle_content <- bquote(atop(paste("path length ", .(nrow(Data_df)), " sample points,    starting point ",
                                      x[0]==0, ",    drift par. ", alpha==.(a), ",  linear trend par. ", 
                                      beta==.(b), ",  memory par. ", theta==.(g),","),
                                paste("state innovation random seeds ", 12345, ", " , 23451, ", " , 34512, 
                                      ",    state innovation var. par. ", sigma^2==1,".")))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
y_name <- bquote(~ X[t] ~ "values")
x_breaks_num <- 15
x_binwidth <- round((max(Data_df$t)-min(Data_df$t))/x_breaks_num, digits=3)
x_breaks_low <- floor((min(Data_df$t)/x_binwidth))*x_binwidth
x_breaks_up <- ceiling((max(Data_df$t)/x_binwidth))*x_binwidth
x_breaks <- round(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth),3)
x_labs <- format(x_breaks, scientific=FALSE)
j <- 0
x_lims <- c(x_breaks_low-j*x_binwidth, x_breaks_up+j*x_binwidth)
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$x_r,Data_df$x_b,Data_df$x_g)-min(Data_df$x_r,Data_df$x_b,Data_df$x_g))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$x_r,Data_df$x_b,Data_df$x_g)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$x_r,Data_df$x_b,Data_df$x_g)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
x_k_col <- bquote("random seed" ~  12345)
x_r_col <- bquote("random seed" ~  12345)
x_b_col <- bquote("random seed" ~  23451)
x_g_col <- bquote("random seed" ~  34512)
leg_labs <- c(x_k_col, x_r_col, x_b_col, x_g_col)
leg_ord <- c("x_k_col", "x_r_col", "x_b_col", "x_g_col")
leg_cols <- c("x_k_col"="black", "x_r_col"="red", "x_b_col"="blue", "x_g_col"="green")
Data_df_LP <- ggplot(Data_df, aes(x=t)) + 
  geom_hline(yintercept = 0, size=0.3, colour="black") +
  geom_vline(xintercept = 0, size=0.3, colour="black") +
  geom_line(alpha=1, size=0.6, aes(y=x_b, color="x_b_col"), group=1) +
  geom_line(alpha=1, size=0.6, aes(y=x_g, color="x_g_col"), group=1) +
  geom_line(alpha=1, size=0.6, aes(y=x_r, color="x_r_col"), group=1) +
  geom_line(data=subset(Data_df, Data_df$t <= 0), alpha=1, size=1, aes(y=x_r, color="x_k_col")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_color_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_ord) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(Data_df_LP)

From the visual inspection of both the scatter and line plot, the three paths of the MA(1) process show a slight evidence for a trend, but no evidence for seasonality. Moreover, there is no visual evidence of heteroskedasticity.

We concentrate on the analysis of the black-red path, characterized by random seed 12345.

The scatter plot.

# library(ggplot2)
Data_df <- Gauss_MA1_df
First_Date <- as.character(Data_df$t[1])
Last_Date <- as.character(Data_df$t[nrow(Data_df)])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Scatter Plot of the Black-Red Path of a Gaussian MA(1) process with Drift and Linear Trend for t=", First_Date, " to t=", Last_Date))
subtitle_content <- bquote(atop(paste("path length ", .(nrow(Data_df)), " sample points,    starting point ",
                                      x[0]==0, ",    drift par. ", alpha==.(a), ",  linear trend par. ", 
                                      beta==.(b), ",  memory par. ", theta==.(g),","),
                                paste("state innovation random seeds", 12345, 
                                      ",    state innovation var. par. ", sigma^2==1,".")))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
y_name <- bquote(~ X[t] ~ "values")
x_breaks_num <- 15
x_binwidth <- round((max(Data_df$t)-min(Data_df$t))/x_breaks_num, digits=3)
x_breaks_low <- floor((min(Data_df$t)/x_binwidth))*x_binwidth
x_breaks_up <- ceiling((max(Data_df$t)/x_binwidth))*x_binwidth
x_breaks <- round(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth),3)
x_labs <- format(x_breaks, scientific=FALSE)
j <- 0
x_lims <- c(x_breaks_low-j*x_binwidth, x_breaks_up+j*x_binwidth)
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$x_r)-min(Data_df$x_r))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$x_r)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$x_r)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
x_k_col <- bquote("random seed" ~  12345)
x_r_col <- bquote("random seed" ~  12345)
x_rgrln <- bquote("Regression Line")
x_loess <- bquote("LOESS Curve")
leg_labs <- c(x_k_col, x_r_col, x_rgrln, x_loess)
leg_ord  <- c("x_k_col", "x_r_col", "x_rgrln", "x_loess")
leg_cols <- c("x_k_col"="black", "x_r_col"="red", "x_rgrln"="blue", "x_loess"="green")
Data_df_SP <- ggplot(Data_df, aes(x=t)) + 
  geom_hline(yintercept = 0, size=0.3, colour="black") +
  geom_vline(xintercept = 0, size=0.3, colour="black") +
  geom_smooth(alpha=1, size = 0.5, linetype="solid", aes(x=t, y=x_r, color="x_rgrln"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=x_r, color="x_loess"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_point(alpha=1, size=1, aes(y=x_r, color="x_r_col")) +
  geom_point(data=subset(Data_df, Data_df$t <= 0), alpha=1, size=1, aes(y=x_r, color="x_k_col")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_ord,
                     guide=guide_legend(override.aes=list(shape=c(NA,NA,NA,NA),
                     linetype=c("dotted", "dotted", "solid", "dashed")))) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(Data_df_SP)

The line plot.

# library(ggplot2)
Data_df <- Gauss_MA1_df
First_Date <- as.character(Data_df$t[1])
Last_Date <- as.character(Data_df$t[nrow(Data_df)])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Line Plot of the Black-Red Path of a Gaussian MA(1) process with Drift and Linear Trend for t=", First_Date, " to t=", Last_Date))
subtitle_content <- bquote(atop(paste("path length ", .(nrow(Data_df)), " sample points,    starting point ",
                                      x[0]==0, ",    drift par. ", alpha==.(a), ",  linear trend par. ", 
                                      beta==.(b), ",  memory par. ", theta==.(g),","),
                                paste("state innovation random seeds ", 12345, 
                                      ",    state innovation var. par. ", sigma^2==1,".")))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
y_name <- bquote(~ X[t] ~ "values")
x_breaks_num <- 15
x_binwidth <- round((max(Data_df$t)-min(Data_df$t))/x_breaks_num, digits=3)
x_breaks_low <- floor((min(Data_df$t)/x_binwidth))*x_binwidth
x_breaks_up <- ceiling((max(Data_df$t)/x_binwidth))*x_binwidth
x_breaks <- round(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth),3)
x_labs <- format(x_breaks, scientific=FALSE)
j <- 0
x_lims <- c(x_breaks_low-j*x_binwidth, x_breaks_up+j*x_binwidth)
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$x_r)-min(Data_df$x_r))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$x_r)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$x_r)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)

y_labs <- format(y_breaks, scientific=FALSE)
K <- 0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
x_rgrln <- bquote("Regression Line")
x_loess <- bquote("LOESS Curve")
leg_labs <- c(x_k_col, x_r_col, x_rgrln, x_loess)
leg_ord  <- c("x_k_col", "x_r_col", "x_rgrln", "x_loess")
leg_cols <- c("x_k_col"="black", "x_r_col"="red", "x_rgrln"="blue", "x_loess"="green")
Data_df_LP <- ggplot(Data_df, aes(x=t)) + 
  geom_hline(yintercept = 0, size=0.3, colour="black") +
  geom_vline(xintercept = 0, size=0.3, colour="black") +
  geom_smooth(alpha=1, size = 0.5, linetype="solid", aes(x=t, y=x_r, color="x_rgrln"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=x_r, color="x_loess"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_line(alpha=1, size=0.6, aes(y=x_r, color="x_r_col"), group=1) +
  geom_line(data=subset(Data_df, Data_df$t <= 0), alpha=1, size=1, aes(y=x_r, color="x_k_col")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, labels=x_breaks, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_ord,
                     guide=guide_legend(override.aes=list(linetype=c("solid", "solid", "solid", "dashed")))) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")
plot(Data_df_LP)

Plot of the autocorrelogram.

y <- Gauss_MA1_df$x_r
length <- length(y)
maxlag <- ceiling(10*log10(length))
Aut_Fun_y <- acf(y, lag.max = maxlag, type="correlation", plot=FALSE)
ci_90 <- qnorm((1+0.90)/2)/sqrt(length)
ci_95 <- qnorm((1+0.95)/2)/sqrt(length)
ci_99 <- qnorm((1+0.99)/2)/sqrt(length)
Plot_Aut_Fun_y <- data.frame(lag=Aut_Fun_y$lag, acf=Aut_Fun_y$acf)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Plot of the Autocorrelogram of the Black-Red Path of a Gaussian MA(1) process with Drift and Linear Trend for t=", First_Date, " to t=", Last_Date))
subtitle_content <- bquote(paste("path length ", .(length), " sample points,   ", "lags ", .(maxlag)))
caption_content <- "Author: Roberto Monte"
ggplot(Plot_Aut_Fun_y, aes(x=lag, y=acf)) + 
  geom_segment(aes(x=lag, y=rep(0,length(lag)), xend=lag, yend=acf), size = 1, col="black") +
  # geom_col(mapping=NULL, data=NULL, position="dodge", width = 0.1, col="black", inherit.aes = TRUE)+
  geom_hline(aes(yintercept=-ci_90, color="CI_90"), show.legend = TRUE, lty=3) +
  geom_hline(aes(yintercept=ci_90, color="CI_90"), lty=3) +
  geom_hline(aes(yintercept=ci_95, color="CI_95"), show.legend = TRUE, lty=4) + 
  geom_hline(aes(yintercept=-ci_95, color="CI_95"), lty=4) +
  geom_hline(aes(yintercept=-ci_99, color="CI_99"), show.legend = TRUE, lty=4) +
  geom_hline(aes(yintercept=ci_99, color="CI_99"), lty=4) +
  scale_x_continuous(name="lag", breaks=waiver(), label=waiver()) +
  scale_y_continuous(name="acf value", breaks=waiver(), labels=NULL,
                     sec.axis = sec_axis(~., breaks=waiver(), labels=waiver())) +
  scale_color_manual(name="Conf. Inter.", labels=c("90%","95%","99%"),
                     values=c(CI_90="red", CI_95="blue", CI_99="green")) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")

Plot of the partial autocorrelogram.

y <- Gauss_MA1_df$x_r
length <- length(y)
maxlag <- ceiling(10*log10(length))
P_Aut_Fun_y <- pacf(y, lag.max = maxlag, plot=FALSE)
ci_90 <- qnorm((1+0.90)/2)/sqrt(length)
ci_95 <- qnorm((1+0.95)/2)/sqrt(length)
ci_99 <- qnorm((1+0.99)/2)/sqrt(length)
Plot_P_Aut_Fun_y <- data.frame(lag=P_Aut_Fun_y$lag, pacf=P_Aut_Fun_y$acf)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Plot of the Partial Autocorrelogram of the Black-Red Path of a Gaussian MA(1) process with Drift and Linear Trend for t=", First_Date, " to t=", Last_Date))
subtitle_content <- bquote(paste("path length ", .(length), " sample points,   ", "lags ", .(maxlag)))
caption_content <- "Author: Roberto Monte"
ggplot(Plot_P_Aut_Fun_y, aes(x=lag, y=pacf)) + 
  geom_segment(aes(x=lag, y=rep(0,length(lag)), xend=lag, yend=pacf), size = 1, col="black") +
  # geom_col(mapping=NULL, data=NULL, position="dodge", width = 0.1, col="black", inherit.aes = TRUE)+
  geom_hline(aes(yintercept=-ci_90, color="CI_90"), show.legend = TRUE, lty=3) +
  geom_hline(aes(yintercept=ci_90, color="CI_90"), lty=3) +
  geom_hline(aes(yintercept=ci_95, color="CI_95"), show.legend = TRUE, lty=4) + 
  geom_hline(aes(yintercept=-ci_95, color="CI_95"), lty=4) +
  geom_hline(aes(yintercept=-ci_99, color="CI_99"), show.legend = TRUE, lty=4) +
  geom_hline(aes(yintercept=ci_99, color="CI_99"), lty=4) +
  scale_x_continuous(name="lag", breaks=waiver(), label=waiver()) +
  scale_y_continuous(name="pacf value", breaks=waiver(), labels=NULL,
                     sec.axis = sec_axis(~., breaks=waiver(), labels=waiver())) +
  scale_color_manual(name="Conf. Inter.", labels=c("90%","95%","99%"),
                     values=c(CI_90="red", CI_95="blue", CI_99="green")) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")

The plots of the autocorrelograms show evidence for autocorrelation of MA(1) type.

We apply the Ljung-Box (LB) test

y <- Gauss_MA1_df$x_r
Box.test(y, lag = 1, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  y
## X-squared = 28.599, df = 1, p-value = 8.904e-08

We have the rejection of the null hypothesis of no autocorrelation.

In light of the visual evidences from the scatter and line plots and from the autocorrelograms and the computational evidences from the LB test, we have an overall evidence that the process generating the time series might be an MA(1) process with linear trend.

To deal with the likely non stationarity of the time series, we consider the linear regression of time series on the time variable.

Gauss_MA1_lm  <- lm(x_r~t, data=Gauss_MA1_df)
summary(Gauss_MA1_lm)
## 
## Call:
## lm(formula = x_r ~ t, data = Gauss_MA1_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -27.9156  -9.3908  -0.8861   9.6956  28.7589 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    3.512      1.149   3.056  0.00266 **
## t              2.155      2.287   0.942  0.34766   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.25 on 149 degrees of freedom
## Multiple R-squared:  0.005922,   Adjusted R-squared:  -0.00075 
## F-statistic: 0.8876 on 1 and 149 DF,  p-value: 0.3477

Hence, we analyze the residuals of the linear model.

Gauss_MA1_lm_df <- add_column(Gauss_MA1_df, x_r_res=Gauss_MA1_lm$residuals, .after="x_r")

The residuals scatter plot.

# The Residuals scatter plot
Data_df  <- Gauss_MA1_lm_df
length <- nrow(Data_df)
First_Date <- as.character(Data_df$t[1])
Last_Date <- as.character(Data_df$t[nrow(Data_df)])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Scatter Plot of Residuals vs Time of the Linear Model for the Black-Red Path of a Gaussian MA(1) process with Drift and Linear Trend for t=", First_Date, " to t=", Last_Date))
subtitle_content <- bquote(paste("path length ", .(length), " sample points"))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
y_name <- bquote(~ X[t] ~ "values")
x_breaks_num <- 15
x_binwidth <- round((max(Data_df$t)-min(Data_df$t))/x_breaks_num, digits=3)
x_breaks_low <- floor((min(Data_df$t)/x_binwidth))*x_binwidth
x_breaks_up <- ceiling((max(Data_df$t)/x_binwidth))*x_binwidth
x_breaks <- round(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth),3)
x_labs <- format(x_breaks, scientific=FALSE)
j <- 0
x_lims <- c(x_breaks_low-j*x_binwidth, x_breaks_up+j*x_binwidth)
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$x_r)-min(Data_df$x_r))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$x_r)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$x_r)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("Path of Residuals vs Time")
col_2 <- bquote("LOESS Curve")
col_3 <- bquote("Regression Line")
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="blue", "col_2"="red", "col_3"="green")
leg_ord <- c("col_1", "col_2", "col_3")
Data_df_SP <- ggplot(Data_df) +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=x_r_res, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=x_r_res, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_point(alpha=1, size=1.0, shape=19, aes(x=t, y=x_r_res, color="col_1")) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_ord,
                      guide=guide_legend(override.aes=list(shape=c(NA,NA,NA), 
                      linetype=c("dotted", "dashed", "solid")))) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=90, vjust=1),
        legend.key.width = unit(1.0,"cm"), legend.position="bottom")
plot(Data_df_SP)

The residuals line plot.

# The Residuals Line plot
Data_df  <- Gauss_MA1_lm_df
length <- nrow(Data_df)
First_Date <- as.character(Data_df$t[1])
Last_Date <- as.character(Data_df$t[nrow(Data_df)])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Line Plot of Residuals vs Time of the Linear Model for the Black-Red Path of a Gaussian MA(1) process with Drift and Linear Trend for t=", First_Date, " to t=", Last_Date))
subtitle_content <- bquote(paste("path length ", .(length), " sample points"))
caption_content <- "Author: Roberto Monte"
x_name <- bquote(~ t ~ "values")
y_name <- bquote(~ X[t] ~ "values")
x_breaks_num <- 15
x_binwidth <- round((max(Data_df$t)-min(Data_df$t))/x_breaks_num, digits=3)
x_breaks_low <- floor((min(Data_df$t)/x_binwidth))*x_binwidth
x_breaks_up <- ceiling((max(Data_df$t)/x_binwidth))*x_binwidth
x_breaks <- round(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth),3)
x_labs <- format(x_breaks, scientific=FALSE)
j <- 0
x_lims <- c(x_breaks_low-j*x_binwidth, x_breaks_up+j*x_binwidth)
y_breaks_num <- 10
y_binwidth <- round((max(Data_df$x_r)-min(Data_df$x_r))/y_breaks_num, digits=3)
y_breaks_low <- floor((min(Data_df$x_r)/y_binwidth))*y_binwidth
y_breaks_up <- ceiling((max(Data_df$x_r)/y_binwidth))*y_binwidth
y_breaks <- round(seq(from=y_breaks_low, to=y_breaks_up, by=y_binwidth),3)
y_labs <- format(y_breaks, scientific=FALSE)
K <- 0
y_lims <- c((y_breaks_low-K*y_binwidth), (y_breaks_up+K*y_binwidth))
col_1 <- bquote("Path of Residuals vs Time")
col_2 <- bquote("LOESS Curve")
col_3 <- bquote("Regression Line")
leg_labs <- c(col_1, col_2, col_3)
leg_cols <- c("col_1"="blue", "col_2"="red", "col_3"="green")
leg_ord <- c("col_1", "col_2", "col_3")
Data_df_LP <- ggplot(Data_df) +
  geom_smooth(alpha=1, size = 0.8, linetype="solid", aes(x=t, y=x_r_res, color="col_3"),
              method = "lm" , formula = y ~ x, se=FALSE, fullrange=TRUE) +
  geom_smooth(alpha=1, size = 0.8, linetype="dashed", aes(x=t, y=x_r_res, color="col_2"),
              method = "loess", formula = y ~ x, se=FALSE) +
  geom_line(alpha=1, size=0.5, linetype="solid", aes(x=t, y=x_r_res, color="col_1", group=1)) +
  scale_x_continuous(name=x_name, breaks=x_breaks, label=x_labs, limits=x_lims) +
  scale_y_continuous(name=y_name, breaks=y_breaks, labels=NULL, limits=y_lims,
                     sec.axis = sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  scale_colour_manual(name="Legend", labels=leg_labs, values=leg_cols, breaks=leg_ord,
                      guide=guide_legend(override.aes=list(linetype=c("solid", "dashed", "solid")))) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=90, vjust=1),
        legend.key.width = unit(1.0,"cm"), legend.position="bottom")
plot(Data_df_LP)

The linear regression allows to clear the trend. From the scatter and line plot of the residuals of the linear model we have no evidences for non stationarity, neither for heteroskedasticity.

We consider the correlograms of the residuals

Plot of the autocorrelogram.

y <- Gauss_MA1_lm_df$x_r_res
length <- length(y)
maxlag <- ceiling(10*log10(length))
Aut_Fun_y <- acf(y, lag.max = maxlag, type="correlation", plot=FALSE)
ci_90 <- qnorm((1+0.90)/2)/sqrt(length)
ci_95 <- qnorm((1+0.95)/2)/sqrt(length)
ci_99 <- qnorm((1+0.99)/2)/sqrt(length)
Plot_Aut_Fun_y <- data.frame(lag=Aut_Fun_y$lag, acf=Aut_Fun_y$acf)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Plot of the Autocorrelogram of the residuals of the linear model for the Black-Red Path of a Gaussian MA(1) process with Drift and Linear Trend for t=", First_Date, " to t=", Last_Date))
subtitle_content <- bquote(paste("path length ", .(length), " sample points,   ", "lags ", .(maxlag)))
caption_content <- "Author: Roberto Monte"
ggplot(Plot_Aut_Fun_y, aes(x=lag, y=acf)) + 
  geom_segment(aes(x=lag, y=rep(0,length(lag)), xend=lag, yend=acf), size = 1, col="black") +
  # geom_col(mapping=NULL, data=NULL, position="dodge", width = 0.1, col="black", inherit.aes = TRUE)+
  geom_hline(aes(yintercept=-ci_90, color="CI_90"), show.legend = TRUE, lty=3) +
  geom_hline(aes(yintercept=ci_90, color="CI_90"), lty=3) +
  geom_hline(aes(yintercept=ci_95, color="CI_95"), show.legend = TRUE, lty=4) + 
  geom_hline(aes(yintercept=-ci_95, color="CI_95"), lty=4) +
  geom_hline(aes(yintercept=-ci_99, color="CI_99"), show.legend = TRUE, lty=4) +
  geom_hline(aes(yintercept=ci_99, color="CI_99"), lty=4) +
  scale_x_continuous(name="lag", breaks=waiver(), label=waiver()) +
  scale_y_continuous(name="acf value", breaks=waiver(), labels=NULL,
                     sec.axis = sec_axis(~., breaks=waiver(), labels=waiver())) +
  scale_color_manual(name="Conf. Inter.", labels=c("90%","95%","99%"),
                     values=c(CI_90="red", CI_95="blue", CI_99="green")) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")

Plot of the partial autocorrelogram.

y <- Gauss_MA1_lm_df$x_r_res
length <- length(y)
maxlag <- ceiling(10*log10(length))
P_Aut_Fun_y <- pacf(y, lag.max = maxlag, plot=FALSE)
ci_90 <- qnorm((1+0.90)/2)/sqrt(length)
ci_95 <- qnorm((1+0.95)/2)/sqrt(length)
ci_99 <- qnorm((1+0.99)/2)/sqrt(length)
Plot_P_Aut_Fun_y <- data.frame(lag=P_Aut_Fun_y$lag, pacf=P_Aut_Fun_y$acf)
First_Date <- paste(Data_df$Month[1],Data_df$Year[1])
Last_Date <- paste(Data_df$Month[length],Data_df$Year[length])
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "Plot of the Partial Autocorrelogram of the residuals of the linear model for the Black-Red Path of a Gaussian MA(3) process with Drift and Linear Trend for t=", First_Date, " to t=", Last_Date))
subtitle_content <- bquote(paste("path length ", .(length), " sample points,   ", "lags ", .(maxlag)))
caption_content <- "Author: Roberto Monte"
ggplot(Plot_P_Aut_Fun_y, aes(x=lag, y=pacf)) + 
  geom_segment(aes(x=lag, y=rep(0,length(lag)), xend=lag, yend=pacf), size = 1, col="black") +
  # geom_col(mapping=NULL, data=NULL, position="dodge", width = 0.1, col="black", inherit.aes = TRUE)+
  geom_hline(aes(yintercept=-ci_90, color="CI_90"), show.legend = TRUE, lty=3) +
  geom_hline(aes(yintercept=ci_90, color="CI_90"), lty=3) +
  geom_hline(aes(yintercept=ci_95, color="CI_95"), show.legend = TRUE, lty=4) + 
  geom_hline(aes(yintercept=-ci_95, color="CI_95"), lty=4) +
  geom_hline(aes(yintercept=-ci_99, color="CI_99"), show.legend = TRUE, lty=4) +
  geom_hline(aes(yintercept=ci_99, color="CI_99"), lty=4) +
  scale_x_continuous(name="lag", breaks=waiver(), label=waiver()) +
  scale_y_continuous(name="pacf value", breaks=waiver(), labels=NULL,
                     sec.axis = sec_axis(~., breaks=waiver(), labels=waiver())) +
  scale_color_manual(name="Conf. Inter.", labels=c("90%","95%","99%"),
                     values=c(CI_90="red", CI_95="blue", CI_99="green")) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0),
        legend.key.width = unit(0.8,"cm"), legend.position="bottom")

Clearly the correlation of MA(1) type is confirmed.

We estimate the memory parameter.

# library(astsa)
y <- Gauss_MA1_lm_df$x_r_res
MA1_x_r_res <- sarima(y, p=0, d=0, q=1, no.constant = TRUE)
## initial  value 2.498909 
## iter   2 value 2.355961
## iter   3 value 2.331867
## iter   4 value 2.312252
## iter   5 value 2.308011
## iter   6 value 2.305721
## iter   7 value 2.305355
## iter   8 value 2.305341
## iter   9 value 2.305341
## iter   9 value 2.305341
## iter   9 value 2.305341
## final  value 2.305341 
## converged
## initial  value 2.308596 
## iter   2 value 2.308488
## iter   3 value 2.308488
## iter   3 value 2.308488
## iter   3 value 2.308488
## final  value 2.308488 
## converged

show(MA1_x_r_res)
## $fit
## 
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
##     xreg = xmean, include.mean = FALSE, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ma1
##       0.7859
## s.e.  0.0799
## 
## sigma^2 estimated as 100.5:  log likelihood = -562.84,  aic = 1129.68
## 
## $degrees_of_freedom
## [1] 150
## 
## $ttable
##     Estimate     SE t.value p.value
## ma1   0.7859 0.0799  9.8352       0
## 
## $AIC
## [1] 7.481343
## 
## $AICc
## [1] 7.48152
## 
## $BIC
## [1] 7.521307
y <- Gauss_MA1_lm_df$x_r_res
MA2_x_r_res <- sarima(y, p=0, d=0, q=2, no.constant = TRUE)
## initial  value 2.498909 
## iter   2 value 2.346585
## iter   3 value 2.313332
## iter   4 value 2.302270
## iter   5 value 2.298290
## iter   6 value 2.297341
## iter   7 value 2.295567
## iter   8 value 2.295422
## iter   9 value 2.295410
## iter  10 value 2.295410
## iter  11 value 2.295410
## iter  11 value 2.295410
## final  value 2.295410 
## converged
## initial  value 2.299652 
## iter   2 value 2.299574
## iter   3 value 2.299550
## iter   4 value 2.299550
## iter   4 value 2.299550
## iter   4 value 2.299550
## final  value 2.299550 
## converged

show(MA2_x_r_res)
## $fit
## 
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
##     xreg = xmean, include.mean = FALSE, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ma1      ma2
##       0.7146  -0.1496
## s.e.  0.0795   0.0891
## 
## sigma^2 estimated as 98.56:  log likelihood = -561.49,  aic = 1128.98
## 
## $degrees_of_freedom
## [1] 149
## 
## $ttable
##     Estimate     SE t.value p.value
## ma1   0.7146 0.0795  8.9874  0.0000
## ma2  -0.1496 0.0891 -1.6785  0.0953
## 
## $AIC
## [1] 7.476712
## 
## $AICc
## [1] 7.477249
## 
## $BIC
## [1] 7.536658

We extract the residuals from the MA(1) model and test them for normality.

MA1_x_r_res_res <- MA1_x_r_res$fit$residuals

Jarque-Bera (JB) test

# Jarque-Bera (*JB*) test.
# library(tseries)
y <- MA1_x_r_res_res
y_JB <- jarque.bera.test(y)
show(y_JB)
## 
##  Jarque Bera Test
## 
## data:  y
## X-squared = 1.6211, df = 2, p-value = 0.4446

Shapiro-Wilks (SW) test.

# Shapiro-Wilks (*SW*) test.
# library(stats)
y <- MA1_x_r_res_res
y_SW <- shapiro.test(y)
show(y_SW)
## 
##  Shapiro-Wilk normality test
## 
## data:  y
## W = 0.99249, p-value = 0.6152

D’Agostino Pearson (DP) test.

# D'Agostino Pearson (*DP*) test.
# library(fBasics)
y <- MA1_x_r_res_res
y_DP <- dagoTest(y)
show(y_DP)
## 
## Title:
##  D'Agostino Normality Test
## 
## Test Results:
##   STATISTIC:
##     Chi2 | Omnibus: 2.1605
##     Z3  | Skewness: -0.179
##     Z4  | Kurtosis: -1.4589
##   P VALUE:
##     Omnibus  Test: 0.3395 
##     Skewness Test: 0.8579 
##     Kurtosis Test: 0.1446

Then, we show the density histogram of the noise component of the ARWMS Logarithm time series, compared to the density curve of the centered Gaussian distribution with the same standard deviation.

Gauss_MA1_lm_df <- add_column(Gauss_MA1_lm_df, x_r_res_res=MA1_x_r_res_res, .after="x_r_res")
Data_df <- Gauss_MA1_lm_df
title_content <- bquote(atop("University of Roma \"Tor Vergata\" - Essentials of Time Series Analysis \u0040 MPSMF 2022-2023", "density Histogram of the Residuals of the MA(1) Model for the Residuals of the Linear Model for the Black-Red Path of a Gaussian MA(1)"))
subtitle_content <- bquote(paste("path length ", .(length), " sample points"))
caption_content <- "Author: Roberto Monte"
x_breaks_num <- 10
x_binwidth <- round((max(Data_df$x_r_res_res)-min(Data_df$x_r_res_res))/x_breaks_num, digits=2)
x_breaks_low <- floor((min(Data_df$x_r_res_res)/x_binwidth))*x_binwidth
x_breaks_up <- ceiling((max(Data_df$x_r_res_res)/x_binwidth))*x_binwidth
x_breaks <- c(seq(from=x_breaks_low, to=x_breaks_up, by=x_binwidth))
x_labs <- format(x_breaks, scientific=FALSE)
# x_lims <- c((x_breaks_low-1.0*x_binwidth), (x_breaks_up+1.0*x_binwidth))
Data_df_DH <- ggplot(Data_df, aes(x=x_r_res_res)) +
  geom_histogram(binwidth = x_binwidth, aes(y=..density..), # bins=2  # density histogram
                 color="black", fill="blue", alpha=0.5)+
  stat_function(fun=dnorm, colour = "red", args = list(mean=0, sd=sd(Data_df$x_r_res_res))) +
  
  #  scale_x_continuous(name="Sample Data", breaks=waiver(), labels=waiver()) +
  scale_x_continuous(name="Sample Data", breaks=x_breaks, labels=x_labs
                     # , limits=x_lims
  ) +
  scale_y_continuous(name="Data Density", breaks=waiver(), labels=NULL,
                     sec.axis=sec_axis(~., breaks=waiver(), labels=waiver())) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content, caption=caption_content) +
  theme(plot.title=element_text(hjust = 0.5), 
        plot.subtitle=element_text(hjust =  0.5),
        plot.caption = element_text(hjust = 1.0))
plot(Data_df_DH)

Since we have

sd(Data_df$x_r_res_res)
## [1] 10.0605

a “good model” for the black and red time series is the AM(1) process \(\left(X_{t}\right)_{t\in\mathbb{N_{0}}}\) given by \[\begin{equation} X_{t} = \hat{\alpha} + \hat{\beta}t + \hat{\theta}W_{t-1} + W_{t}, \end{equation}\] where \(X_{0}=0\), \(\hat{\alpha}=3.512\), \(\hat{\beta}=2.155\), \(\hat{\theta}=0.7859\), and \(\left(W_{t}\right)_{t\in\mathbb{N_{0}}}\equiv W\) is a Gaussian white noise with estimated standard deviation \(\hat{\sigma_{\mathbf{W}}}=10.0605\).

13.2 Moving Avg. Processes of Order \(q\) - MA(q) Processes

Let \(\left(X_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{X}\) be a stochastic process on a probability space \(\Omega\) with states in \(\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\).

Definition 13.4 (MA(q) processes) We say that \(\mathbf{X}\) is an \(N\)-variate real moving average process of order \(q\), for some \(q\in\mathbf{N}\) if there exist \(\mu\in\mathbb{R}^{N}\), \(\Theta_{1},\dots,\Theta_{q}\in\mathbb{R}^{N}\times\mathbb{R}^{N}\), and a strong white noise \(\left(W_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{W}\) on \(\Omega\) with time set \(\left\{1-q,2-q,\dots,-1\right\}\cup\mathbb{N}_{0}\), states in \(\mathbb{R}^{N}\), and variance covariance matrix \(\Sigma_{\mathbf{W}}^{2}\), for some positive definite symmetric \(\Sigma_{\mathbf{W}}^{2}\) in \(\mathbb{R}^{N}\times\mathbb{R}^{N}\), such that the random variables in \(\mathbf{X}\) satisfy the equation \[\begin{equation} X_{t}=\mu + W_{t} -\Theta_{1}W_{t-1} -\Theta_{2}W_{t-2} -\cdots -\Theta_{q}W_{t-q}, \tag{13.41} \end{equation}\] for every \(t\in\mathbb{N}\). The vector \(\mu\) is [resp. the matrices \(\Theta_{1},\dots,\Theta_{q}\) are] referred to as the mean [resp. the memory weight] of the moving avearge process \(\mathbf{X}\); when we want to stress that \(\mu=0\) the moving average process \(\mathbf{X}\) is said to be demeaned; the strong white noise \(\mathbf{W}\) is referred to as the state innovation of \(\mathbf{X}\). The explicit reference to the state innovation \(\mathbf{W}\) is often omitted when not necessary, though.

To denote that \(\mathbf{X}\) is an \(N\)-variate real moving average process of order \(q\) we write \(\mathbf{X}\sim MA(q)^{N}\). In case \(N=1\), we usually speak of real moving average process of order \(q\), we neglect to mention \(N\) and write \(\mathbf{X}\sim MA(q)\). We also write \(\theta_{1},\dots,\theta_{q}\) for the memory weight parameters rather than \(\Theta_{1},\dots,\Theta_{q}\), and write \(\sigma_{\mathbf{W}}^{2}\) for the variance of the innovation process rather than \(\Sigma_{\mathbf{W}}^{2}\).

In some circumstances, it is more appropriate to consider a moving average process \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{X}\) and a state innovation \(\left(W_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{W}\) with time set \(\mathbb{T}\equiv\mathbb{Z}\). In this case, we say that \(\mathbf{X}\) is a moving average process of the order \(q\) if the random variables in \(\mathbf{X}\) satisfy Equation (13.41), for every \(t\in\mathbb{Z}\).

In what follows, we restrain our attention to \(MA(q)\) processes for which \(\mu\) and \(\theta_{1},\dots,\theta_{q}\) are all real numbers and \(\mathbf{W}\) is a real strong white noise with variance \(\sigma_{\mathbf{W}}^{2}\), for some \(\sigma_{\mathbf{W}}>0\). Recall that in this case both the autocovariance and the autocorrelation functions of \(\mathbf{X}\) are symmetric, that is \[\begin{equation} \gamma_{\mathbf{X}}\left(s,t\right)=\gamma_{\mathbf{X}}\left(t,s\right) \quad\text{and}\quad \rho_{\mathbf{X}}\left(s,t\right)=\rho_{\mathbf{X}}\left(t,s\right) \end{equation}\] for all \(s,t\in\mathbb{Z}\).

Remark (**Independence of the random variables in a MA(q) process**). For all \(s,t\in\mathbb{Z}\) such that \(\left\vert t-s\right\vert>q\) the random variables \(X_{s}\) and \(X_{t}\) in \(\mathbf{X}\) are independent.

Proposition 13.14 (MA(q) mean function) The mean function \(\mu_{\mathbf{X}}:\mathbb{Z}\rightarrow\mathbb{R}\) of \(\mathbf{X}\) is given by \[\begin{equation} \mu_{\mathbf{X}}\left(t\right)=\mu, \tag{13.42} \end{equation}\] for every \(t\in\mathbb{N}\).

Proposition 13.15 (MA(q) variance function) The variance function \(\sigma_{\mathbf{X}}^{2}:\mathbb{Z}\rightarrow\mathbb{R}\) of \(\mathbf{X}\) is given by \[\begin{equation} \sigma_{\mathbf{X}}^{2}\left(t\right)=\left(1+\sum\limits_{\ell=1}^{q}\theta_{\ell}^{2}\right)\sigma_{\mathbf{W}}^{2}, \tag{13.43} \end{equation}\] for every \(t\in\mathbb{N}\).

Proposition 13.16 (MA(q) autocovariance function) The autocovariance function \(\gamma_{\mathbf{X}}:\mathbb{Z\times Z}\rightarrow\mathbb{R}\) of \(\mathbf{X}\) is given by \[\begin{equation} \gamma_{\mathbf{X}}\left(s,t\right)=\left\{ \begin{array} [c]{ll} \left(\sum\limits_{\ell=0}^{q-\left\vert t-s\right\vert}\theta_{\ell}\theta_{\ell+\left\vert t-s\right\vert }\right) \sigma_{\mathbf{W}}^{2}, & \text{if }0\leq\left\vert t-s\right\vert \leq q,\\ 0, & \text{if }\left\vert t-s\right\vert >q. \end{array} \right. \tag{13.44} \end{equation}\] where \(\theta_{0}\equiv1\).

Proof. Having set \(\theta_{0}\equiv1\), when \(\left\vert t-s\right\vert=0\) Equation (13.44) is just a reformulation of Equation (13.43). Assuming \(s<t\), we can write \[\begin{align} \gamma_{\mathbf{X}}\left(s,t\right) & =\mathbf{E}\left[\left(X_{s}-\mathbf{E}\left[X_{s}\right]\right)\left(X_{t}-\mathbf{E}\left[X_{t}\right]\right)\right]\\ & =\mathbf{E}\left[\left(\theta_{0}W_{s}+\theta_{1}W_{s-1}+\cdots +\theta_{q-1}W_{s-q+1}+\theta_{q}W_{s-q}\right) \left(\theta_{0}W_{t}+\theta_{1}W_{t-1}+\cdots+\theta_{q-1}W_{t-q+1}+\theta_{q}W_{t-q}\right)\right]\\ & =\theta_{0}^{2}\mathbf{E}\left[W_{s}W_{t}\right]+\theta_{0}\theta_{1}\mathbf{E}\left[W_{s}W_{t-1}\right]+\cdots +\theta_{0}\theta_{q-1}\mathbf{E}\left[W_{s}W_{t-q+1}\right]+\theta_{0}\theta_{q}\mathbf{E}\left[W_{s}W_{t-q}\right]\\ & +\theta_{1}\theta_{0}\mathbf{E}\left[W_{s-1}W_{t}\right]+\theta_{1}^{2}\mathbf{E}\left[W_{s-1}W_{t-1}\right]+\cdots +\theta_{1}\theta_{q-1}\mathbf{E}\left[W_{s-1}W_{t-q+1}\right]+\theta_{1}\theta_{q}\mathbf{E}\left[W_{s-1}W_{t-q}\right]\\ & +\cdots+\theta_{q-1}\theta_{0}\mathbf{E}\left[ W_{s-q+1}W_{t}\right]+\theta_{q-1}\theta_{1}\mathbf{E}\left[W_{s-q+1}W_{t-1}\right]+\cdots +\theta_{q-1}^{2}\mathbf{E}\left[W_{s-q+1}W_{t-q+1}\right]+\theta_{q}^{2}\mathbf{E}\left[ W_{s-q}W_{t-q}\right]\\ & +\theta_{q}\theta_{0}\mathbf{E}\left[W_{s-q}W_{t}\right]+\theta_{q}\theta_{1}\mathbf{E}\left[W_{s-q}W_{t-1}\right]+\cdots +\theta_{q}\theta_{q-1}\mathbf{E}\left[W_{s-q}W_{t-q+1}\right]+\theta_{q}^{2}\mathbf{E}\left[W_{s-q}W_{t-q}\right] \tag{13.45} \end{align}\] Writing \(s=t-r\), Equation (13.45) becomes. \[\begin{align} \gamma_{\mathbf{X}}\left(s,t\right) & =\theta_{0}^{2}\mathbf{E}\left[W_{t-r}W_{t}\right]+\theta_{0}\theta_{1}\mathbf{E}\left[W_{t-r}W_{t-1}\right]+\cdots +\theta_{0}\theta_{q-1}\mathbf{E}\left[W_{t-r}W_{t-q+1}\right]+\theta_{0}\theta_{q}\mathbf{E}\left[W_{t-r}W_{t-q}\right]\\ & +\theta_{1}\theta_{0}\mathbf{E}\left[W_{t-r-1}W_{t}\right]+\theta_{1}^{2}\mathbf{E}\left[W_{t-r-1}W_{t-1}\right]+\cdots +\theta_{1}\theta_{q-1}\mathbf{E}\left[W_{t-r-1}W_{t-q+1}\right]+\theta_{1}\theta_{q}\mathbf{E}\left[W_{t-r-1}W_{t-q}\right]\\ & +\cdots+\theta_{q-1}\theta_{0}\mathbf{E}\left[W_{t-r-q+1}W_{t}\right]+\theta_{q-1}\theta_{1}\mathbf{E}\left[W_{t-r-q+1}W_{t-1}\right]+\cdots +\theta_{q-1}^{2}\mathbf{E}\left[W_{t-r-q+1}W_{t-q+1}\right]+\theta_{q-1}\theta_{q}\mathbf{E}\left[W_{t-r-q+1}W_{t-q}\right]\\ & +\theta_{q}\theta_{0}\mathbf{E}\left[W_{t-r-q}W_{t}\right]+\theta_{q}\theta_{1}\mathbf{E}\left[W_{t-r-q}W_{t-1}\right]+\cdots +\theta_{q}\theta_{q-1}\mathbf{E}\left[W_{t-r-q}W_{t-q+1}\right]+\theta_{q}^{2}\mathbf{E}\left[W_{t-r-q}W_{t-q}\right]. \end{align}\] Therefore, \[\begin{equation} \gamma_{\mathbf{X}}\left(s,t\right)=\left\{ \begin{array} [c]{ll} \theta_{0}\theta_{1}+\theta_{1}\theta_{2}+\cdots+\theta_{q-1}\theta_{q}, & \text{if }r=1,\\ \theta_{0}\theta_{2}+\theta_{1}\theta_{3}+\cdots+\theta_{q-2}\theta_{q}, & \text{if }r=2,\\ \cdots & \cdots\\ \theta_{0}\theta_{q-1}+\theta_{1}\theta_{q}, & \text{if }r=q-1,\\ \theta_{0}\theta_{q}, & \text{if }r=q,\\ 0, & \text{if }r>q. \end{array} \right. \tag{13.46} \end{equation}\] For \(r\leq q\), Equation (13.46) can be more concisely rewritten as \[\begin{equation} \gamma_{\mathbf{X}}\left(s,t\right)=\sum\limits_{\ell=0}^{q-r}\theta_{\ell}\theta_{\ell+r},\quad r=1,\dots,q. \end{equation}\] From the latter, on account of the symmetry property of the the autocovariance function, the desired (13.44) follows.

Proposition 13.17 (MA(q) autocorrelation function) The autocorrelation function \(\rho_{\mathbf{X}}:\mathbb{Z\times Z}\rightarrow\mathbb{R}\) of \(\mathbf{X}\) is given by \[\begin{equation} \rho_{\mathbf{X}}\left(s,t\right)=\left\{ \begin{array} [c]{ll} \frac{\sum\limits_{\ell=0}^{q-\left\vert t-s\right\vert}\theta_{\ell}\theta_{\ell+\left\vert t-s\right\vert}} {\sum\limits_{\ell=0}^{q}\theta_{\ell}^{2}}, & \text{if }0\leq\left\vert t-s\right\vert \leq q,\\ 0, & \text{if }\left\vert t-s\right\vert >q, \end{array} \right. \tag{13.47} \end{equation}\] where \(\theta_{0}\equiv1\).

Proposition 13.18 (MA(q) autocorrelation function) Choosing \(t_{0}\equiv 0\) the reduced autocorrelation function \(\rho_{\mathbf{X},0}:\mathbb{Z}\rightarrow\mathbb{R}\) of \(\mathbf{X}\) is given by \[\begin{equation} \rho_{X,0}\left(h\right)=\left\{ \begin{array} [c]{ll} \frac{\sum\limits_{\ell=0}^{q-\left\vert t-s\right\vert }\theta_{\ell}\theta_{\ell+\left\vert t-s\right\vert}} {\sum\limits_{\ell=0}^{q}\theta_{\ell}^{2}}, & \text{if }0\leq\left\vert h\right\vert \leq q,\\ 0, & \text{if }\left\vert h\right\vert >q. \end{array} \right. \tag{13.48} \end{equation}\] where \(\theta_{0}\equiv 1\).

Proposition 13.19 (Invertible MA(q) processes) Under the assumption \(\mu=0\), an \(MA(q)\) processes \(\mathbf{X}\) is invertible if and only the roots of the polynomial \[\begin{equation} 1-\theta_{1}z-\theta_{2}z^{2}-\cdots-\theta_{p-1}z^{p-1}-\theta_{p}z^{p} \end{equation}\] are outside the unit circle in the complex plane \(\mathbb{C}\).

14 Autoregressive Moving Avg. Processes of Orders \(p\) and \(q\) - ARMA(p,q) Processes

Let \(\left(X_{t}\right)_{t\in\mathbb{N}_{0}}\equiv\mathbf{X}\) be a stochastic process on a probability space \(\Omega\) with states in \(\mathbb{R}^{N}\), for some \(N\in\mathbb{N}\). We assume that the random variable \(X_{0}\) has finite moment of order \(2\) and set \(\mathbf{E}\left[X_{0}\right]\equiv\mu_{X_{0}}\) and \(Var\left(X_{0}\right)\equiv\Sigma_{X_{0}}\).

Definition 14.1 (ARMA(p,q) processes) We say that \(\mathbf{X}\) is an \(N\)-variate real autoregressive moving average process of orders \(p\) and \(q\), for some \(p,q\in\mathbf{N}\), with drift and linear trend if there exist \(\alpha, \beta\in\mathbb{R}^{N}\), \(\Phi_{1},\dots,\Phi_{p},\Theta_{1},\dots,\Theta_{q}\in\mathbb{R}^{N}\times\mathbb{R}^{N}\), and a strong white noise \(\left(W_{t}\right)_{t\in\mathbb{N}_{0}}\equiv\mathbf{W}\) on \(\Omega\) with states in \(\mathbb{R}^{N}\) and variance-covariance matrix \(\Sigma_{\mathbf{W}}^{2}\), for some definite positive symmetric matrix \(\Sigma_{\mathbf{W}}^{2}\), such that \(X_{0}\) is independent of \(\mathbf{W}\) and the random variables in \(\mathbf{X}\) satisfy the equations \[\begin{equation} \begin{array} [c]{l} X_{1}=\alpha+\beta+\Phi_{1}X_{0}+W_{1}-\Theta_{1}W_{0},\\ X_{2}=\alpha+2\beta+\Phi_{1}X_{1}+\Phi_{2}X_{0}+W_{2}-\Theta_{1}W_{1}-\Theta_{2}W_{0},\\ \cdots\\ X_{p}=\alpha+p\beta+\Phi_{1}X_{p-1}+\Phi_{2}X_{p-2}+\cdots+\Phi_{p-1}X_{1}+\Phi_{p}X_{0}+W_{p}\\ -\Theta_{1}W_{p-1}-\Theta_{2}W_{p-2}-\cdots-\Theta_{p-1}W_{1}-\Theta_{p}W_{0},\\ X_{p+1}=\alpha+\left( p+1\right) \beta+\Phi_{1}X_{p}+\Phi_{2}X_{p-1}+\cdots+\Phi_{p-1}X_{2}+\Phi_{p}X_{1}+W_{p+1}\\ -\Theta_{1}W_{p}-\Theta_{2}W_{p-1}-\cdots-\Theta_{p}W_{1}-\Theta_{p+1}W_{0},\\ \cdots\\ X_{q}=\alpha+q\beta+\Phi_{1}X_{q-1}+\Phi_{2}X_{q-2}+\cdots+\Phi_{p-1}X_{q-p+1}+\Phi_{p}X_{q-p}+W_{q}\\ -\Theta_{1}W_{q-1}-\Theta_{2}W_{q-2}-\cdots-\Theta_{q-1}W_{1}-\Theta_{q}W_{0},\\ X_{t}=\alpha+\beta t+\Phi_{1}X_{t-1}+\Phi_{2}X_{t-2}+\cdots+\Phi_{p-1}X_{t-p+1}+\Phi_{p}X_{t-p}+W_{t}\\ -\Theta_{1}W_{t-1}-\Theta_{2}W_{t-2}-\cdots-\Theta_{q-1}W_{t-q+1}-\Theta_{q}W_{t-q},\quad\forall t\geq q, \end{array} \quad\text{when }q\geq p \tag{14.1} \end{equation}\] or \[\begin{equation} \begin{array} [c]{l} X_{1}=\alpha+\beta+\Phi_{1}X_{0}+W_{1}-\Theta_{1}W_{0},\\ X_{2}=\alpha+2\beta+\Phi_{1}X_{1}+\Phi_{2}X_{0}+W_{2}-\Theta_{1}W_{1}-\Theta_{2}W_{0},\\ \cdots\\ X_{q}=\alpha+q\beta+\Phi_{1}X_{q-1}+\Phi_{2}X_{q-2}+\cdots+\Phi_{q-1}X_{1}+\Phi_{q}X_{0}+W_{q}\\ -\Theta_{1}W_{q-1}-\Theta_{2}W_{q-2}-\cdots-\Theta_{q-1}W_{1}-\Theta_{q}W_{0},\\ X_{q+1}=\alpha+\left( q+1\right) \beta+\Phi_{1}X_{q}+\Phi_{2}X_{q-1}+\cdots+\Phi_{q}X_{1}+\Phi_{q+1}X_{0}+W_{q+1}\\ -\Theta_{1}W_{p}-\Theta_{2}W_{p-1}-\cdots-\Theta_{q}W_{1}-\Theta_{q+1}W_{0},\\ \cdots\\ X_{p}=\alpha+p\beta+\Phi_{1}X_{p-1}+\Phi_{2}X_{p-2}+\cdots+\Phi_{p-1}X_{1}+\Phi_{p}X_{0}+W_{p}\\ -\Theta_{1}W_{p-1}-\Theta_{2}W_{p-2}-\cdots-\Theta_{q-1}W_{p-q+1}-\Theta_{q}W_{p-q},\\ X_{t}=\alpha+\beta t+\Phi_{1}X_{t-1}+\Phi_{2}X_{t-2}+\cdots+\Phi_{p-1}X_{t-p+1}+\Phi_{p}X_{t-p}+W_{t}\\ -\Theta_{1}W_{t-1}-\Theta_{2}W_{t-2}-\cdots-\Theta_{q-1}W_{t-q+1}-\Theta_{q}W_{t-q},\quad\forall t\geq q, \end{array} \quad\text{when }p\geq q \tag{14.2} \end{equation}\] The random vector \(X_{0}\) [resp. the distribution of the random vector \(X_{0}\)] is referred to as the initial state [resp. the initial distribution] of the autoregressive moving average process \(\mathbf{X}\), in case \(X_{0}\equiv x_{0}\in\mathbb{R}^{N}\), we also call \(x_{0}\) the starting point of \(\mathbf{X}\); the vector \(\alpha\), [resp. \(\beta\)] is referred to as the drift [resp. linear trend coefficient] of the autoregressive moving average process \(\mathbf{X}\); when we want to stress that \(\alpha\neq0\) and \(\beta=0\) [resp. \(\alpha=0\) and \(\beta\neq0\)] we call \(\mathbf{X}\) an autoregressive moving average process with drift and no linear trend [resp. with linear trend and no drift]; the matrices \(\Phi_{1},\Phi_{2},\dots,\Phi_{p}\) [resp. \(\Theta_{1},\Theta_{2},\dots,\Theta_{q}\)] are referred to as the regression coefficients [resp. memory weigths] of \(\mathbf{X}\); the strong white noise \(\mathbf{W}\) is referred to as the state innovation of the autoregressive moving average process \(\mathbf{X}\). The explicit reference to the state innovation \(\mathbf{W}\) is often omitted when not necessary, though.

To denote that \(\mathbf{X}\) is an \(N\)-variate real autoregressive moving average process of orders \(p\) and \(q\) we write \(\mathbf{X}\sim ARMA^{N}(p,q)\). In case \(N=1\), we usually speak of real autoregressive moving average process of orders \(p\) and \(q\), we neglect to mention neglect \(N\) and write \(\mathbf{X}\sim ARMA(p,q)\). We also write \(\phi_{1},\dots,\phi_{p}\) [resp. \(\theta_{1},\dots,\theta_{q}\)] for the regression coefficients [memory weights] rather than \(\Phi_{1},\dots,\Phi_{p}\) [resp. \(\Theta_{1},\dots,\Theta_{q}\)], and write \(\sigma_{\mathbf{W}}^{2}\) for the variance of the innovation process rather than \(\Sigma_{\mathbf{W}}^{2}\).

In several circumstances, it is considered a process \(\left(X_{t}\right)_{t\in\mathbb{T}_{p}}\equiv\mathbf{X}\) with time set \(\mathbb{T}_{p}\equiv\left\{1-p,2-p,\dots,-1,\right\}\cup\mathbb{N}_{0}\), for some \(p\in\mathbb{N}\), where the negative indices \(1-p,2-p,\dots,-1\in\mathbb{T}_{p}\) are intended to represent past times with respect to the current time \(0\). In this case we say that \(\mathbf{X}\) is a \(N\)-variate real autoregressive moving average process of orders \(p\) and \(q\), for some \(q\in\mathbb{N}\), with drift and linear trend if there exist \(\alpha,\beta,\Phi_{1},\dots,\Phi_{p},\Theta_{1},\dots,\Theta_{q}\) as in Definition 14.1, and a state innovation process \(\left(W_{t}\right)_{t\in\mathbb{T}_{q}}\equiv\mathbf{W}\) with time set \(\mathbb{T}_{q}\equiv\left\{1-q,2-q,\dots,-1,\right\}\cup\mathbb{N}_{0}\), where also the negative indices \(1-q,2-q,\dots,-1\in\mathbb{T}_{q}\) are intended to represent past times with respect to the current time \(0\), such that the random variables in \(\mathbf{X}\) satisfy the equation \[\begin{equation} X_{t}=\alpha+\beta t+\Phi_{1}X_{t-1}+\Phi_{2}X_{t-2}+\cdots+\Phi_{p-1}X_{t-(p-1)}+\Phi_{p}X_{t-p}+W_{t}\\ -\Theta_{1}W_{t-1}-\Theta_{2}W_{t-2}-\cdots-\Theta_{q-1}W_{t-(q-1)}-\Theta_{q}W_{t-q}, \tag{14.3} \end{equation}\] for every \(t\in\mathbb{N}\). In this case, the random variables \(X_{1-p},X_{2-p},\dots,X_{-1}\) are called past states of the process and the random variables of the state innovation \(\mathbf{W}\) are assumed to be independent of all states of the process which are prior to them. Typically, relying on the argument that the realizations of the past states of a stochastic process have been observed, the random variables \(X_{1-p},X_{2-p},\dots,X_{-1},X_{0}\) are assumed to be Dirac random variables concentrated at some points \(x_{1-p},x_{2-p},\dots,x_{-1},x_{0}\in\mathbb{R}^{N}\).

In other circumstances, it is more appropriate to consider a process \(\left(X_{t}\right)_{t\in\mathbb{T}}\equiv\mathbf{X}\) with time set \(\mathbb{T}\equiv\mathbb{Z}\). In this case we say that \(\mathbf{X}\) is an \(N\)-variate autoregressive moving avearge process of orders \(p\) and \(q\), for some \(p,q\in\mathbb{N}\), with drift and linear trend if there exist \(\alpha,\beta,\Phi_{1},\dots,\Phi_{p},\Theta_{1},\dots,\Theta_{q}\) as in Definition 14.1, and a state innovation process \(\left(W_{t}\right)_{t\in\mathbb{T}_{q}}\equiv\mathbf{W}\) with time set \(\mathbb{T}_{q}\equiv\mathbb{Z}\) such that the random variables in \(\mathbf{X}\) satisfy the equation \[\begin{equation} X_{t}=\alpha+\beta t+\Phi_{1}X_{t-1}+\Phi_{2}X_{t-2}+\cdots+\Phi_{p-1}X_{t-(p-1)}+\Phi_{p}X_{t-p}+W_{t}\\ -\Theta_{1}W_{t-1}-\Theta_{2}W_{t-2}-\cdots-\Theta_{q-1}W_{t-(q-1)}-\Theta_{q}W_{t-q}, \tag{14.4} \end{equation}\] for every \(t\in\mathbb{Z}\). In this case, past, current and future states of the process, corresponding to negative, zero and positive time indices, respectively, are all intended to be random variables and the random variables of the state innovation \(\mathbf{W}\) are assumed to be independent of all states of the process which are prior to them.

In what follows, we restrain our attention to \(ARMA(p,q)\) processes for which \(\alpha\), \(\beta\), and \(\phi_{1},\dots,\phi_{p}, \theta_{1},\dots,\theta_{q}\) are all real numbers and \(\mathbf{W}\) is a real strong white noise with variance \(\sigma_{\mathbf{W}}^{2}\), for some \(\sigma_{\mathbf{W}}>0\). Recall that in this case both the autocovariance and the autocorrelation functions of \(\mathbf{X}\) are symmetric, that is \[\begin{equation} \gamma_{\mathbf{X}}\left(s,t\right)=\gamma_{\mathbf{X}}\left(t,s\right) \quad\text{and}\quad \rho_{\mathbf{X}}\left(s,t\right)=\rho_{\mathbf{X}}\left(t,s\right) \end{equation}\] for all \(s,t\in\mathbb{Z}\).

Proposition 14.1 (Casual ARMA(p,q) processes) Under the assumption \(\alpha=\beta=0\), an \(ARMA(p,q)\) processes \(\mathbf{X}\) is casual if and only the roots of the polynomial \[\begin{equation} 1-\phi_{1}z-\phi_{2}z^{2}-\cdots-\phi_{p-1}z^{p-1}-\phi_{p}z^{p} \end{equation}\] are outside the unit circle in the complex plane \(\mathbb{C}\).

Proposition 14.2 (Invertible ARMA(p,q) processes) Under the assumption \(\alpha=\beta=0\), an \(ARMA(p,q)\) processes \(\mathbf{X}\) is invertible if and only the roots of the polynomial \[\begin{equation} 1-\theta_{1}z-\theta_{2}z^{2}-\cdots-\theta_{q-1}z^{q-1}-\theta_{q}z^{q} \end{equation}\] are outside the unit circle in the complex plane \(\mathbb{C}\).

15 Autoregressive Conditional Heteroskedasticity Process - ARCH Processes

A frequently observed phenomenon related to financial, and more generally economical, time series is volatility clustering. That is, a volatile (high variance) period tends to be followed by another volatile period, otherwise saying volatile periods are often clustered. Intuitively, the market becomes more volatile when important news breaks and it takes some time for the market to fully digest it. Volatility clustering in the states of a time series implies time-varying conditional variance: a large volatility today may lead to a large volatility tomorrow. In 1982, Robert Engle developed the autoregressive conditional heteroskedasticity (ARCH) models to deal with this type of time-varying volatility. For this contribution, he won the 2003 Nobel Prize in Economics (Clive Granger shared the prize for co-integration, see http://www.nobelprize.org/nobel\_prizes/economic-sciences/laureates/2003/press.html). ARCH processes assume that the variance of the current term or innovation is a non-linear function of the variance of the previous noise terms. In particular, the innovation of the current term is often related to the squares of the previous innovations. This allows ARCH processes to have the property of time-varying conditional variance while retaining the property of zero conditional mean. Therefore, although remaining unpredictable, ARCH processes can model the volatility clustering.

Another important phenomenon is fat tail. That is time series appear to be distributed with a higher kurtosis than the Gaussian distribution. ARCH processes can also model this phenomenon, even under the assumption that the state innovation process is Gaussian distributed.

15.1 Autoregressive Conditional Heteroskedasticity Process of Order 1 - ARCH(1) Processes

Let \(\left(X_{t}\right)_{t\in\mathbb{N}_0}\equiv\mathbf{X}\) be a real stochastic process on a probability space \(\Omega\). We assume that the random variable \(X_{0}\) has finite moment of order \(2\). We also assume \(\mathbf{E}\left[X_{0}\right]=0\) and we set \(\mathbf{D}^{2}\left[X_{0}\right]\equiv\sigma_{X_{0}}^{2}\).

Definition 15.1 (ARCH(1) processes) We say that \(X\) is an autoregressive conditional heteroskedasticity process of order \(1\), if there exist \(\alpha_{0},\alpha_{1}\in\mathbf{R}_{++}\) and a real strong white noise \(\left(W_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{W}\) on \(\Omega\) with variance \(\sigma_{\mathbf{W}}^{2}\), for some \(\sigma_{\mathbf{W}}>0\), such that the random variables in \(\mathbf{W}\) are independent of \(X_{0}\) and the random variables in \(\mathbf{X}\) satisfy the equation \[\begin{equation} X_{t}=\sigma_{t}W_{t}, \tag{15.1} \end{equation}\] for every t, where \(\left(\sigma_{t}\right)_{t\in\mathbb{N}}\) is the positive process on \(\Omega\) given by \[\begin{equation} \sigma_{t}^{2}\overset{\text{def}}{=}\alpha_{0}+\alpha_{1}X_{t-1}^{2},\quad\forall t\in\mathbb{N}, \tag{15.2} \end{equation}\]

15.2 Autoregressive Conditional Heteroskedasticity Process of Order q - ARCH(q) Processes

Let \(\left(X_{t}\right)_{t\in\mathbb{N}_0}\equiv\mathbf{X}\) be a real stochastic process on a probability space \(\Omega\). We assume that the random variable \(X_{0}\) has finite moment of order \(2\). We also assume \(\mathbf{E}\left[X_{0}\right]=0\) and we set \(\mathbf{D}^{2}\left[X_{0}\right]\equiv\sigma_{X_{0}}^{2}\).

Definition 15.2 (ARCH(q) processes) We say that \(X\) is an autoregressive conditional heteroskedasticity process of order \(q\), for some \(q\in\mathbb{N}\), if there exist \(\alpha_{0},\alpha_{1},\dots,\alpha_{q}\in\mathbf{R}\) such that \(\alpha_{0}>0\) and \(\alpha_{q}>0\) and a real strong white noise \(\left(W_{t}\right)_{t\in\mathbb{N}}\equiv\mathbf{W}\) on \(\Omega\) with variance \(\sigma_{\mathbf{W}}^{2}\), for some \(\sigma_{\mathbf{W}}>0\), such the random variables in \(\mathbf{W}\) are independent of \(X_{0}\) and the random variables in \(\mathbf{X}\) satisfy the equation \[\begin{equation} X_{t}=\sigma_{t}W_{t}, \tag{15.3} \end{equation}\] for every t, where \(\left(\sigma_{t}\right)_{t\in\mathbb{N}}\) is the positive process on \(\Omega\) given by \[\begin{equation} \sigma_{t}^{2}\overset{\text{def}}{=}\alpha_{0}+\sum\limits_{s=1}^{q}\alpha_{s}X_{t-s}^{2},\quad\forall t\in\mathbb{N}, \end{equation}\]\tag{15.4} \end{equation}

Let \(\left(X_{t}\right)_{t\in\mathbb{N}_{0}}\equiv\mathbf{X}\) be an \(ARCH(q)\) process.

Remark (**ARCH(q) processes**). As an immediate consequence of Definition 15.2, we have \[\begin{align} & X_{1}=\sigma_{1}W_{1},\quad\sigma_{1}^{2}\overset{\text{def}}{=}\alpha_{0}+\alpha_{1}X_{0}^{2}\\ & X_{2}=\sigma_{2}W_{2},\quad\sigma_{2}^{2}\overset{\text{def}}{=}\alpha_{0}+\alpha_{1}X_{1}^{2}+\alpha_{2}X_{0}^{2}\\ & \vdots\\ & X_{t}=\sigma_{t}W_{t}, \quad\sigma_{t}^{2}\overset{\text{def}}{=}\alpha_{0}+\alpha_{1}X_{t-1}^{2}+\alpha_{2}X_{t-2}^{2}+\cdots+\alpha_{t}X_{0}^{2}, \quad\forall t<q\\ & \vdots\\ & X_{q}=\sigma_{q}W_{q}, \quad\sigma_{q}^{2}\overset{\text{def}}{=}\alpha_{0}+\alpha_{1}X_{q-1}^{2}+\alpha_{2}X_{q-2}^{2}+\cdots+\alpha_{q}X_{0}^{2}\\ & \vdots\\ & X_{t}=\sigma_{t}W_{t},\quad\sigma_{t}^{2}\overset{\text{def}}{=}\alpha_{0}+\alpha_{1}X_{t-1}^{2}+\alpha_{2}X_{t-2}^{2} +\cdots+\alpha_{q}X_{t-q}^{2}, \quad\forall t>q \end{align}\]

16 Generalized ARCH Processes - GARCH Processes

In generalized autoregressive conditional heteroskedasticity processes the variance of the process depends not only by the previous values of the states of the process but also on the previous values of the variance.

16.1 Generalized ARCH Processes of Orders 1 and 1 - GARCH(1,1) Processes

Let \(\left(X_{t}\right)_{t\in\mathbb{N}_0}\equiv\mathbf{X}\) be a real stochastic process on a probability space \(\Omega\). We assume that the random variable \(X_{0}\) has finite moment of order \(2\). We also assume \(\mathbf{E}\left[X_{0}\right]=0\) and we set \(\mathbf{D}^{2}\left[X_{0}\right]\equiv\sigma_{X_{0}}^{2}\).

Definition 16.1 (GARCH(1,1) processes) We say that \(\mathbf{X}\) is a generalized autoregressive conditional heteroskedasticity process of orders \(1\) and \(1\), if there exist \(\alpha_{0},\alpha_{1},\beta_{1}\in\mathbb{R}_{++}\), a positive random variable \(\sigma_{0}\), and a real strong white noise \(\left(W_{t}\right)_{t\in\mathbb{N}}\equiv \mathbf{W}\) on \(\Omega\), such that the random variables in \(\mathbf{W}\) are independent of \(X_{0}\) and \(\sigma_{0}\) and the random variables in \(\mathbf{X}\) satisfy the equation \[\begin{equation} X_{t}=\sigma_{t}W_{t}, \tag{16.1} \end{equation}\] for every t, where \(\left(\sigma_{t}\right)_{t\in\mathbb{N}_{0}}\equiv\sigma\) is the positive process given by \[\begin{equation} \sigma_{t}^{2}\overset{\text{def}}{=} \alpha_{0}+\alpha_{1}X_{t-1}^{2}+\beta_{1}\sigma_{t-1}^{2} \quad\forall t\in\mathbb{N} \tag{16.2} \end{equation}\]

16.2 Generalized ARCH Processes of Orders p and q - GARCH(p,q) Processes

Let \(\left(X_{t}\right)_{t\in\mathbb{N}_0}\equiv\mathbf{X}\) be a real stochastic process on a probability space \(\Omega\). We assume that the random variable \(X_{0}\) has finite moment of order \(2\). We also assume \(\mathbf{E}\left[X_{0}\right]=0\) and we set \(\mathbf{D}^{2}\left[X_{0}\right]\equiv\sigma_{X_{0}}^{2}\).

Definition 16.2 (GARCH(p,q) processes) We say that \(\mathbf{X}\) is a generalized autoregressive conditional heteroskedasticity process of orders \(p\) and \(q\), for some \(p,q\in\mathbb{N}\), if there exist \(\alpha_{0},\alpha_{1},\dots,\alpha_{q},\beta_{1},\beta_{2},\dots,\beta_{p}\in\mathbb{R}_{+}\) such that \(\alpha_{0}>0\), \(\alpha_{q}>0\), and \(\beta_{p}>0\), a positive random variable \(\sigma_{0}\), and a real strong white noise \(\left(W_{t}\right)_{t\in\mathbb{N}}\equiv \mathbf{W}\) on \(\Omega\), such that the random variables in \(\mathbf{W}\) are independent of \(X_{0}\) and \(\sigma_{0}\) and the random variables in \(\mathbf{X}\) satisfy the equation \[\begin{equation} X_{t}=\sigma_{t}W_{t}, \tag{16.3} \end{equation}\] for every t, where \(\left(\sigma_{t}\right)_{t\in\mathbb{N}_{0}}\equiv\sigma\) is the positive process given by \[\begin{equation} \sigma_{t}^{2}\overset{\text{def}}{=}\left\{ \begin{array} [c]{ll} \alpha_{0}+\alpha_{1}X_{0}^{2}+\beta_{1}\sigma_{0}^{2} & \text{if }t=1\\ \alpha_{0}+\sum\limits_{s=1}^{t}\alpha_{s}X_{t-s}^{2}+\sum\limits_{s=1}^{t}\beta_{s}\sigma_{t-s}^{2}, & \text{if }1<t<\min\left\{ p,q\right\},\\ \alpha_{0}+\sum\limits_{s=1}^{t}\alpha_{s}X_{t-s}^{2}+\sum\limits_{s=1}^{p}\beta_{s}\sigma_{t-s}^{2}, & \text{if }p\leq t<q,\\ \alpha_{0}+\sum\limits_{s=1}^{q}\alpha_{s}X_{t-s}^{2}+\sum\limits_{s=1}^{t}\beta_{s}\sigma_{t-s}^{2} & \text{if }q\leq t<p,\\ \alpha_{0}+\sum\limits_{s=1}^{q}\alpha_{s}X_{t-s}^{2}+\sum\limits_{s=1}^{p}\beta_{s}\sigma_{t-s}^{2}, & \text{if }t\geq\max\left\{p,q\right\}. \end{array} \right. \tag{16.4} \end{equation}\]

knitr::knit_exit()